Using Boost.Lockfree queue is slower than using mutexes Using Boost.Lockfree queue is slower than using mutexes multithreading multithreading

Using Boost.Lockfree queue is slower than using mutexes


Lock free algorithms generally perform more poorly than lock-based algorithms. That's a key reason they're not used nearly as frequently.

The problem with lock free algorithms is that they maximize contention by allowing contending threads to continue to contend. Locks avoid contention by de-scheduling contending threads. Lock free algorithms, to a first approximation, should only be used when it's not possible to de-schedule contending threads. That only rarely applies to application-level code.

Let me give you a very extreme hypothetical. Imagine four threads are running on a typical, modern dual-core CPU. Threads A1 and A2 are manipulating collection A. Threads B1 and B2 are manipulating collection B.

First, let's imagine the collection uses locks. That will mean that if threads A1 and A2 (or B1 and B2) try to run at the same time, one of them will get blocked by the lock. So, very quickly, one A thread and one B thread will be running. These threads will run very quickly and will not contend. Any time threads try to contend, the conflicting thread will get de-scheduled. Yay.

Now, imagine the collection uses no locks. Now, threads A1 and A2 can run at the same time. This will cause constant contention. Cache lines for the collection will ping-pong between the two cores. Inter-core buses may get saturated. Performance will be awful.

Again, this is highly exaggerated. But you get the idea. You want to avoid contention, not suffer through as much of it as possible.

However, now run this thought experiment again where A1 and A2 are the only threads on the entire system. Now, the lock free collection is probably better (though you may find that it's better just to have one thread in that case!).

Almost every programmer goes through a phase where they think that locks are bad and avoiding locks makes code go faster. Eventually, they realize that it's contention that makes things slow and locks, used correctly, minimize contention.


I cannot say that the boost lockfree queue is slower in all possible cases. In my experience, the push(const T& item) is trying to make a copy. If you are constructing tmp objects and pushing on the queue, then you are hit by a performance drag. I think the library just need the overloaded version push(T&& item) to make movable object more efficient. Before the addition of the new function, you might have to use pointers, the plain type, or the smart ones offered after C++11. This is a rather limited aspect of the queue, and I only use the lockfree queue vary rarely.