Testing approach for multi-threaded software Testing approach for multi-threaded software multithreading multithreading

Testing approach for multi-threaded software


Some suggestions:

  • Utilize the law of large numbers and perform the operation under test not only once, but many times.
  • Stress-test your code by exaggerating the scenarios. E.g. to test your mutex-holding class, use scenarios where the mutex-protected code:
    • is very short and fast (a single instruction)
    • is time-consuming (Sleep with a large value)
    • contains explicit context switches (Sleep (0))
  • Run your test on various different architectures. (Even if your software is Windows-only, test it on single- and multicore processors with and without hyperthreading, and a wide range of clock speeds)
  • Try to design your code such that most of it is not exposed to multithreading issues. E.g. instead of accessing shared data (which requires locking or very carefully designed lock-avoidance techniques), let your worker threads operate on copies of the data, and communicate with them using queues. Then you only have to test your queue class for thread-safety
  • Run your tests when the system is idle as well as when it is under load from other tasks (e.g. our build server frequently runs multiple builds in parallel. This alone revealed many multithreading bugs that happened when the system was under load.)
  • Avoid asserting on timeouts. If such an assert fails, you don't know whether the code is broken or whether the timeout was too short. Instead, use a very generous timeout (just to ensure that the test eventually fails). If you want to test that an operation doesn't take longer than a certain time, measure the duration, but don't use a timeout for this.


Whilst I agree with @rstevens answer in that there's currently no way to unit test threading issues with 100% certainty there are some things that I've found useful.

Firstly whatever tests you have make sure you run them on lots of different spec boxes. I have several build machines, all different, multi-core, single core, fast, slow, etc. The good thing about how diverse they are is that different ones will throw up different threading issues. I've regularly been surprised to add a new build machine to my farm and suddenly have a new threading bug exposed; and I'm talking about a new bug being exposed in code that has run 10000s of times on the other build machines and which shows up 1 in 10 on the new one...

Secondly most of the unit testing that you do on your code needn't involve threading at all. The threading is, generally, orthogonal. So step one is to tease the code apart so that you can test the actual code that does the work without worrying too much about the threaded nature. This usually means creating an interface that the threading code uses to drive the real code. You can then test the real code in isolation.

Thridly you can test where the threaded code interacts with the main body of code. This means writing a mock for the interface that you developed to separate the two blocks of code. By now the threading code is likely much simpler and you can then often place synchronisation objects in the mock that you've made so that you can control the code under test. So, you'd spin up your thread and wait for it to set an event by calling into your mock and then have it block on another event which your test code controls. The test code can then step the threaded code from one point in your interface to the next.

Finally (if you've decoupled things enough that you can do the earlier stuff then this is easy) you can then run larger pieces of the multi-threaded parts of the app under test and make sure you get the results that you expect; you can play with the priority of the threads and maybe even add a couple of test threads that simply eat CPU to stir things up a bit.

Now you run all of these tests many many times on different hardware...

I've also found that running the tests (or the app) under something like DevPartner BoundsChecker can help a lot as it messes with the thread scheduling such that it sometimes shakes out hard to find bugs. I also wrote a deadlock detection tool which checks for lock inversions during program execution but I only use that rarely.

You can see an example of how I test multi-threaded C++ code here: http://www.lenholgate.com/blog/2004/05/practical-testing.html


Not really an answer:

Testing multithreaded bugs is very difficult. Most bugs only show up if two (or more) threads go to specific places in code in a specific order.If and when this condition is met may depend on the timing of the process running. This timing may change due to one of the following pre-conditions:

  • Type of processor
  • Processor speed
  • Number of processors/cores
  • Optimization level
  • Running inside or outside the debugger
  • Operating system

There are for sure more pre-conditions that I forgot.

Because MT-bugs so highly depend on the exact timing of the code running Heisenberg's "Uncertainty principle" comes in here: If you want to test for MT bugs you change the timing by your "measures" which may prevent the bug from occurring...

The timing thing is what makes MT bugs so highly non-deterministic.In other words: You may have a software that runs for months and then crashes some day and after that may run for years. If you don't have some debug logs/core dumps etc. you may never know why it crashes.

So my conclusion is: There is no really good way to Unit-Test for thread-safety. You always have to keep your eyes open when programming.

To make this clear I will give you a (simplified) example from real life (I encountered this when changing my employer and looking on the existing code there):

Imagine you have a class. You want that class to automatically deleted if no-one uses it anymore. So you build a reference-counter into that class:(I know it is a bad style to delete an instance of a class in one of it's methods. This is because of the simplification of the real code which uses a Ref class to handle counted references.)

class A {  private:    int refcount;  public:    A() : refcount(0) {    }    void Ref() {      refcount++;    }    void Release() {      refcount--;      if (refcount == 0) {        delete this;      }    }};

This seams pretty simple and nothing to worry about. But this is not thread-safe!It's because "refcount++" and "refcount--" are not atomic operations but both are three operations:

  • read refcount from memory to register
  • increment/decrement register
  • write refcount from register to memory

Each of those operations can be interrupted and another thread may, at the same time manipulate the same refcount. So if for example two threads want to incremenet refcount the following COULD happen:

  • Thread A: read refcount from memory to register (refcount: 8)
  • Thread A: increment register
    • CONTEXT CHANGE -
  • Thread B: read refcount from memory to register (refcount: 8)
  • Thread B: increment register
  • Thread B: write refcount from register to memory (refcount: 9)
    • CONTEXT CHANGE -
  • Thread A: write refcount from register to memory (refcount: 9)

So the result is: refcount = 9 but it should have been 10!

This can only be solved by using atomic operations (i.e. InterlockedIncrement() & InterlockedDecrement() on Windows).

This bug is simply untestable! The reason is that it is so highly unlikely that there are two threads at the same time trying to modify the refcount of the same instance and that there are context switches in between that code.

But it can happen! (The probability increases if you have a multi-processor or multi-core system because there is no context switch needed to make it happen).It will happen in some days, weeks or months!