Slowdown on creating objects with many threads Slowdown on creating objects with many threads multithreading multithreading

Slowdown on creating objects with many threads


Start Taskmgr.exe, Processes tab. View + Select columns, tick "Page Fault Delta". You'll see the impact of allocating hundreds of megabytes, just to store the stacks of all these threads you created. Every time that number blips for your process, your program blocks waiting for the operating system paging in data from the disk into RAM.

TANSTAAFL, There ain't no such thing as a free lunch.


My guess is that the problem is that garbage collection requires a certain amount of cooperation between threads - something either needs to check that they're all suspended, or ask them to suspend themselves and wait for it to happen, etc. (And even if they are suspended, it has to tell them not to wake up!)

This describes a "stop the world" garbage collector, of course. I believe there are at least two or three different GC implementations which differ in the details around parallelism... but I suspect that all of them are going to have some work to do in terms of getting threads to cooperate.


What you are seeing here is the GC in action. When you attach a debugger to your process you will see that many exceptions of the form

Unknown exception - code e0434f4e (first chance)

are thrown. This are exceptions caused by the GC to resume a suspended thread. As you know it is strongly discouraged to call Suspend/ResumeThread inside your process. This is even more true in managed world. The only authority which can do this safely is the GC. When you set a breakpoint at SuspendThread you will see

0118f010 5f3674da 00000000 00000000 83e36f53 KERNEL32!SuspendThread0118f064 5f28c51d 00000000 83e36e63 00000000 mscorwks!Thread::SysSuspendForGC+0x2b0 (FPO: [Non-Fpo])0118f154 5f28a83d 00000001 00000000 00000000 mscorwks!WKS::GCHeap::SuspendEE+0x194 (FPO: [Non-Fpo])0118f17c 5f28c78c 00000000 00000000 0000000c mscorwks!WKS::GCHeap::GarbageCollectGeneration+0x136 (FPO: [Non-Fpo])0118f208 5f28a0d3 002a43b0 0000000c 00000000 mscorwks!WKS::gc_heap::try_allocate_more_space+0x15a (FPO: [Non-Fpo])0118f21c 5f28a16e 002a43b0 0000000c 00000000 mscorwks!WKS::gc_heap::allocate_more_space+0x11 (FPO: [Non-Fpo])0118f23c 5f202341 002a43b0 0000000c 00000000 mscorwks!WKS::GCHeap::Alloc+0x3b (FPO: [Non-Fpo])0118f258 5f209721 0000000c 00000000 00000000 mscorwks!Alloc+0x60 (FPO: [Non-Fpo])0118f298 5f2097e6 5e2d078c 83e36c0b 00000000 mscorwks!FastAllocateObject+0x38 (FPO: [Non-Fpo])

that the GC does try to suspend all of your threads before he can do a full collection. On my machine (32 bit, Windows 7, .NET 3.5 SP1) the slowdown is not so dramatic. I do see a linear dependency between the thread count and the CPU (non) usage. It seems you are seeing increased costs for each GC because the GC has to suspend more threads before it can do a full collect. Interestingly the time is spent mainly in usermode so the kernel is not the limitting factor.

I do net see a way how you could get around that except using less threads or using unmanaged code. It could be that if you host the CLR by yourself and use Fibers instead of physical threads that the GC will scale much better. Unfortunately this feature was cut out during the relase cycle of .NET 2.0. Since it is now 6 years later there is little hope that it will be added ever again.

Besides from your thread count the GC is also limitted by the complexity of your object graph. Have a look at this "Do You Know The Costs Of Garbage?".