Why isn't this random number generator thread-safe? Why isn't this random number generator thread-safe? multithreading multithreading

Why isn't this random number generator thread-safe?


There are three observations about your test output I would make:

  • It has much stronger variance than a good random source's average should provide. You observed this yourself by comparing to the single thread results.

  • The calculated average decreases with thread count and never reaches the original 0.5 (i.e. it's not just higher variance but also changed mean).

  • There is a temporal relation in the data, particularly visible in the 4 thread case.

All this is explained by the race condition present in your code: You assign to SUM from multiple threads. Incrementing a double is not an atomic operation (even on x86, where you'll probably get atomic reads and writes on registers). Two threads may read the current value (e.g. 10), increment it (e.g. both add 0.5) and then write the value back to memory. Now you have two threads writing a 10.5 instead of the correct 11.

The more threads try to write to SUM concurrently (without synchronization), the more of their changes are lost. This explains all observations:

  • How hard the threads race each other in each individual run determines how many results are lost, and this can vary from run to run.

  • The average is lower with more races (for example more threads) because more results are lost. You can never exceeed the statistical 0.5 average because you only ever lose writes.

  • As the threads and scheduler "settle in", the variance is reduced. This is a similar reason to why you should "warm up" your tests when benchmarking.

Needless to say, this is undefined behavior. It just shows benign behavior on x86 CPUs, but this is not something the C++ standard guarantees you. For all you know, the individual bytes of a double might be written to by different threads at the same time resulting in complete garbage.

The proper solution would be adding the doubles thread-locally and then (with synchronization) adding them together in the end. OMP has reduction clauses for this specific purpose.

For integral types, you could use std::atomic<IntegralType>::fetch_add(). std::atomic<double> exists but (before C++20) the mentioned function (and others) are only available for integral types.


The problem is not in your RNG, but in your summation. There is simply a race condition on SUM. To fix this, use a reduction, e.g. change the pragma to:

#pragma omp parallel for ordered schedule(static) reduction(+:SUM)

Note that using thread_local with OpenMP is technically not defined behavior. It will probably work in practice, but the interaction between OpenMP and C++11 threading concepts is not well defined (see also this question). So the safe OpenMP alternative for you would be:

static mt19937* generator = nullptr;#pragma omp threadprivate(generator)