Can multithreading speed up memory allocation? Can multithreading speed up memory allocation? multithreading multithreading

Can multithreading speed up memory allocation?


Standard CRT

While with older of Visual Studio the default CRT allocator was blocking, this is no longer true at least for Visual Studio 2010 and newer, which calls corresponding OS functions directly. The Windows heap manager was blocking until Widows XP, in XP the optional Low Fragmentation Heap is not blocking, while the default one is, and newer OSes (Vista/Win7) use LFH by default. The performance of recent (Windows 7) allocators is very good, comparable to scalable replacements listed below (you still might prefer them if targeting older platforms or when you need some other features they provide). There exist several multiple "scalable allocators", with different licenses and different drawbacks. I think on Linux the default runtime library already uses a scalable allocator (some variant of PTMalloc).

Scalable replacements

I know about:

You might want to check Scalable memory allocator experiences for my experiences with trying to use some of them in a Windows project.

In practice most of them work by having a per thread cache and per thread pre-allocated regions for allocations, which means that small allocations most often happen inside of a context of thread only, OS services are called only infrequently.


Dynamic allocation of memory uses the heap of the application/module/process (but not thread). The heap can only handle one allocation request at a time. If you try to allocate memory in "parallel" threads, they will be handled in due order by the heap. You will not get a behaviour like: one thread is waiting to get its memory while another can ask for some, while a third one is getting some. The threads will have to line-up in queue to get their chunk of memory.

What you would need is a pool of heaps. Use whichever heap is not busy at the moment to allocate the memory. But then, you have to watch out throughout the life of this variable such that it does not get de-allocated on another heap (that would cause a crash).

I know that Win32 API has functions such as GetProcessHeap(), CreateHeap(), HeapAlloc() and HeapFree(), that allow you to create a new heap and allocate/deallocate memory from a specific heap HANDLE. I don't know of an equivalence in other operating systems (I have looked for them, but to no avail).

You should, of course, try to avoid doing frequent dynamic allocations. But if you can't, you might consider (for portability) to create your own "heap" class (doesn't have to be a heap per se, just a very efficient allocator) that can manage a large chunk of memory and surely a smart pointer class that would hold a reference to the heap from which it came. This would enable you to use multiple heaps (make sure they are thread-safe).


There are 2 scalable drop-in replacements for malloc that I know of:

I don't have any experience with Hoard (which performed poorly in the study), but Emery Berger lurks on this site and was astonished by the results. He said he would have a look and I surmise there might have been some specifics to either the test or implementation that "trapped" Hoard as the general feedback is usually good.

One word of caution with jemalloc, it can waste a bit of space when you rapidly create then discard threads (as it creates a new pool for each thread you allocate from). If your threads are stable, there should not be any issue with this.