C++ Producer consumer queue with (very) fast and reliable handover

c++ multithreading performance producer-consumer

One way amortise the overhead of locking and thread wakeup is to add a second queue and implement a double-buffering approach. This enables batch-processing at the consumer side:

template<typename F>std::size_t consume_all(F&& f){       // minimize the scope of the lock    {        std::lock_guard<std::mutex> lock(the_mutex);        std::swap(the_queue, the_queue2);    }    // process all items from the_queue2 in batch    for (auto& item : the_queue2)    {        f(item);    }    auto result = the_queue2.size();            the_queue2.clear(); // clears the queue and preserves the memory. perfect!    return result;}

Working sample code.

This does not fix the latency issue, but it can improve throughput. If a hiccup occurs then the consumer will be presented with a large batch which can then be processed at full speed without any locking overhead. This allows the consumer to quickly catch up with the producer.

c++ multithreading performance producer-consumer

There are lots of things that might cause problems.

You probably need to try to profile the app and see where slowdowns might be occurring.

Some notes:

Are the consumer and producer in the same process? If so, a Critical Section is much faster than a Mutex.
Try to ensure all the queue memory is in in current memory. If you have to swap pages that will slow right down.
Be very careful setting your process to real time priority. That is supposed to be for system processes. If the process does too much work, it can prevent a critical system process getting cpu, which can end very badly. Unless you absolutely need real time, just use HIGH_PRIORITY_CLASS

c++ multithreading performance producer-consumer

The short answer is yes, from there it really is down to operating system management, and thread scheduling. RTSs (Real time systems) can bring those 50 micros to about 15 micros, and more importantly, they can get rid of the outliers. Otherwise spinning is the only answer. If there are more queues than cores, the idea might be to have x number of threads spinning, to react immediately, and the remaining blocking. That would involve some kind of "master" queue thread, constantly spinning to check all queues, and - either processing items itself - or handing them over to worker threads, of which some could also be spinning to save those 50 micros. It gets complicated, though.

Probably best would be to just use a single lock-free multiple-producer-single-consumer queue with a spinning consumer thread. Then all items that go into the queues would probably need to derive from a common base type, and would need to contain some meta-info as to what to do with the items.

Complicated, but possible. If I ever set it up, I might post some code as well.

CodeHunter

C++ Producer consumer queue with (very) fast and reliable handover

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last