Synchronous Parallel Process in C# / C++ Synchronous Parallel Process in C# / C++ multithreading multithreading

Synchronous Parallel Process in C# / C++


If I understand you right,

  • a[i] can only be calculated when c[i-1] is available
  • b[i] can only be calculated when c[i-1] is available
  • c[i] is only available when a[i] and b[i] are calculated

It means that the only process which you can do separately is calculating a[i] and b[i].

That's how I see it in C#:

for (int i = 1; i < N; i++){    Task<double> calcA = Task.Factory.StartNew(() => { return f1(x[i] + c[i-1]); });    Task<double> calcB = Task.Factory.StartNew(() => { return f2(x[i] + c[i-1]); });    // .Result will block the execution and wait for both calculations to complete    c[i] = calcA.Result + calcB.Result; }

This will run two separate threads, which will calculate f1 and f2 respectively. After both f1 and f2 are calculated, it will set c[i] value, and run the next iteration.

Note that:

  • I use double, assuming that your f1 and f2 return double
  • The loop starts from 1, assuming that you have some initial a[0] and b[0] values. Otherwise, c[i-1] would throw an exception
  • This will only bring improvement if calculation of f1 and f2 is really resource-consuming and long, compared to other calculations
  • Task.Factory.StartNew (unlike using Thread) uses ThreadPool which means that it doesn't create a new thread every time, but reuses the existing from the pool. It noticably reduces the overhead.


The only parallel part in this algorithm is calculation of f1 and f2, but you say that f1 and f2 are not time consumptive, so it might be much better to use SIMD vectorization (e.g. System.Numerics.Vectors in C#) and run it on one core (that also reduce cache misses). Or probably you could modify your algorithm to be parallelizeable (but it might require hard work).


Without going into a code solution, you want to use some kind of barrier. This allows to check if all participans have declared they are finished with the task. Thread 2 will have to wait for thread one in this example

https://en.wikipedia.org/wiki/Barrier_(computer_science)Example of C++ "Memory barrier"