Multithreaded job queue manager Multithreaded job queue manager multithreading multithreading

Multithreaded job queue manager


We had to build our own job queue system to meet requirements similar to yours ( UI thread must always respond within 33ms, jobs can run from 15-15000ms ), because there really was nothing out there that quite met our needs, let alone was performant.

Unfortunately our code is about as proprietary as proprietary gets, but I can give you some of the most salient features:

  • We start up one thread per core at the beginning of the program. Each pulls work from a global job queue. Jobs consist of a function object and a glob of associated data (really an elaboration on a func_ptr and void *). Thread 0, the fast client loop, isn't allowed to work on jobs, but the rest grab as they can.
  • The job queue itself ought to be a lockless data structure, such as a lock-free singly linked list (Visual Studio comes with one). Avoid using a mutex; contention for the queue is surprisingly high, and grabbing mutexes is costly.
  • Pack up all the necessary data for the job into the job object itself -- avoid having pointer from the job back into the main heap, where you'll have to deal with contention between jobs and locks and all that other slow, annoying stuff. For example, all the simulation parameters should go into the job's local data blob. The results structure obviously needs to be something that outlives the job: you can deal with this either by a) hanging onto the job objects even after they've finished running (so you can use their contents from the main thread), or b) allocating a results structure specially for each job and stuffing a pointer into the job's data object. Even though the results themselves won't live in the job, this effectively gives the job exclusive access to its output memory so you needn't muss with locks.

  • Actually I'm simplifying a bit above, since we need to choreograph exactly which jobs run on which cores, so each core gets its own job queue, but that's probably unnecessary for you.


I rolled my own, based on Boost.threads. I was quite surprised by how much bang I got from writing so little code. If you don't find something pre-made, don't be afraid to roll your own. Between Boost.threads and your experience since writing your own, it might be easier than you remember.

For premade options, don't forget that Chromium is licensed very friendly, so you may be able to roll your own generic library around its code.


Microsoft is working on a set of technologies for the next Version of Visual Studio 2010 called the Concurrency Runtime, the Parallel Pattern Library and the Asynchronous Agents Library which will probably help. The Concurrency Runtime will offer policy based scheduling, i.e. allowing you to manage and compose multiple scheduler instances (similar to thread pools but with affinitization and load balancing between instances), the Parallel Pattern Library will offer task based programming and parallel loops with an STL like programming model. The Agents library offers an actor based programming model and has support for building concurrent data flow pipelines, i.e. managing those dependencies described above. Unfortunately this isn't released yet, so you can read about it on our team blog or watch some of the videos on channel9 there is also a very large CTP that is available for download as well.

If you're looking for a solution today, Intel's Thread Building Blocks and boost's threading library are both good libraries and available now. JustSoftwareSolutions has released an implementation of std::thread which matches the C++0x draft and of course OpenMP is widely available if you're looking at fine-grained loop based parallelism.

The real challenge as other folks have alluded to is to correctly identify and decompose work into tasks suitable for concurrent execution (i.e. no unprotected shared state), understand the dependencies between them and minimize the contention that can occur on bottlenecks (whether the bottleneck is protecting shared state or ensuring the dispatch loop of a work queue is low contention or lock-free)... and to do this without scheduling implementation details leaking into the rest of your code.

-Rick