Parallel version of loop not faster than serial version

performance multithreading parallel-processing boost-thread atomic-values

Perform the computation on that particle, storing the result in a separate array

How heavy are computations?

Generally speaking atomic counter may cost hundreds of clock cycles and it is quite important tosee that you do not only increment counters.
Also try to see how much job each thread does - do they cooperate well (i.e. on each cycle each proceeds about half of particle).
Try to subdivide the job to bigger chunks then single particle (let's say 100 particles and so on).
See how much job is done outside of threads.

Honestly... it looks like what are you talking about is a bug.

performance multithreading parallel-processing boost-thread atomic-values

profiling has not revealed much

This is unclear. I have experience profiling a multithreaded application on HP-UX and there their profiler says percent of time each function runs. So if you have one or few contention points in your functions you get increase in time your application spends in these functions. In my case I got significant increase in pthread_mutex_unlock(). When I changed my code it became much faster.

So could you post here the same statistics for one thread and for two/four threads. And number of computations in each test.

Also I recommend you (if it is possible) to set a breakpoint on global function locking a mutex. You might find that somewhere in your algorithm you incidentally lock a global mutex.

performance multithreading parallel-processing boost-thread atomic-values

Your language is kind of revealing:

Wait on xxx

this might be your problem.

Plus you get slow when adding to a single result queue again - you might add the results only at the end of the processing into a single queue if possible. The main thread should not wait, buy check the global counter after every update.
Instead of profiling I would add performance counters which you log at the end. You may put them into conditional compilation error, so that they are not added to your production code.

CodeHunter

Parallel version of loop not faster than serial version

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last