OpenMP num_threads(1) executes faster than no OpenMP OpenMP num_threads(1) executes faster than no OpenMP multithreading multithreading

OpenMP num_threads(1) executes faster than no OpenMP


OpenMP has significant synchronization overheads. I have found that unless you have a really big loop that does a lot of work, and has no intra-loop synchronization, then it is generally not worthwhile using OpenMP.

I think that when you set the number of threads to one (1), OpenMP simply does a procedure call to the OpenMP procedure implementing the loop, so the overhead is minimal, and performance is essentially identical to the non-OpenMP case.

Otherwise, I think OpenMP sets some semaphores, and waiting "worker" threads wake up, synchronize their access to the data structures telling them what loop parameters to set, and then call the routine that does the work, and when they complete the chunk of work, they signal the master thread again. This synchronization must happen for each chunk of work that a thread does, and the synchronization costs are non-trivial.

Using the STATIC scheduling option can help reduce the scheduling/synchronization overheads, particularly if the number of loop iterations is large relative to the number of cores.