.NET: ThreadStatic vs lock { }. Why ThreadStaticAttribute degrades performance?
For RELEASE build there seems to be almost no [ThreadStatic] performance penalty (only slight penalty on modern CPUs).
Here comes dis-assembly code for ms_Acc += one
; for RELEASE
optimization is enabled:
No [ThreadStatic]
, DEBUG
:
00000060 mov eax,dword ptr [ebp-40h] 00000063 add dword ptr ds:[00511718h],eax
No [ThreadStatic]
, RELEASE
:
00000051 mov eax,dword ptr [00040750h]00000057 add eax,dword ptr [rsp+20h]0000005b mov dword ptr [00040750h],eax
[ThreadStatic]
, DEBUG
:
00000066 mov edx,1 0000006b mov ecx,4616E0h 00000070 call 664F7450 00000075 mov edx,1 0000007a mov ecx,4616E0h 0000007f mov dword ptr [ebp-50h],eax 00000082 call 664F7450 00000087 mov edx,dword ptr [eax+18h] 0000008a add edx,dword ptr [ebp-40h] 0000008d mov eax,dword ptr [ebp-50h] 00000090 mov dword ptr [eax+18h],edx
[ThreadStatic]
, RELEASE
:
00000058 mov edx,1 0000005d mov rcx,7FF001A3F28h 00000067 call FFFFFFFFF6F9F740 0000006c mov qword ptr [rsp+30h],rax 00000071 mov rbx,qword ptr [rsp+30h] 00000076 mov ebx,dword ptr [rbx+20h] 00000079 add ebx,dword ptr [rsp+20h] 0000007d mov edx,1 00000082 mov rcx,7FF001A3F28h 0000008c call FFFFFFFFF6F9F740 00000091 mov qword ptr [rsp+38h],rax 00000096 mov rax,qword ptr [rsp+38h] 0000009b mov dword ptr [rax+20h],ebx
You have two lines of code that update ms_Acc
. In the lock
case, you have a single lock around both of these, while in the ThreadStatic
case, it happens once for each access to ms_Acc
, i.e. twice for each iteration of your loop. This is generally the benefit of using lock
, you get to choose the granularity you want. I am guessing that the RELEASE build optimised this difference away.
I would be interested to see if the performance becomes very similar, or identical, if you change the for loop to a single access to ms_Acc
.