How does Intel TBB's scalable_allocator work? How does Intel TBB's scalable_allocator work? multithreading multithreading

How does Intel TBB's scalable_allocator work?


There is a good paper on the allocator: The Foundations for Scalable Multi-core Software in Intel Threading Building Blocks

My limited experience: I overloaded the global new/delete with the tbb::scalable_allocator for my AI application. But there was little change in the time profile. I didn't compare the memory usage though.


The solution you mentioned is optimized for Intel CPUs. It incorporates specific CPU mechanisms to improve performance.

Sometime ago I found another very useful solution: Fast C++11 allocator for STL containers. It slightly speeds up STL containers on VS2017 (~5x) as well as on GCC (~7x). It uses memory pool for elements allocation which makes it extremely effective for all platofrms.