Synchronous Parallel Process in C# / C++
If I understand you right,
a[i]
can only be calculated whenc[i-1]
is availableb[i]
can only be calculated whenc[i-1]
is availablec[i]
is only available whena[i]
andb[i]
are calculated
It means that the only process which you can do separately is calculating a[i]
and b[i]
.
That's how I see it in C#:
for (int i = 1; i < N; i++){ Task<double> calcA = Task.Factory.StartNew(() => { return f1(x[i] + c[i-1]); }); Task<double> calcB = Task.Factory.StartNew(() => { return f2(x[i] + c[i-1]); }); // .Result will block the execution and wait for both calculations to complete c[i] = calcA.Result + calcB.Result; }
This will run two separate threads, which will calculate f1
and f2
respectively. After both f1
and f2
are calculated, it will set c[i]
value, and run the next iteration.
Note that:
- I use
double
, assuming that yourf1
andf2
returndouble
- The loop starts from 1, assuming that you have some initial
a[0]
andb[0]
values. Otherwise,c[i-1]
would throw an exception - This will only bring improvement if calculation of
f1
andf2
is really resource-consuming and long, compared to other calculations Task.Factory.StartNew
(unlike usingThread
) uses ThreadPool which means that it doesn't create a new thread every time, but reuses the existing from the pool. It noticably reduces the overhead.
The only parallel part in this algorithm is calculation of f1 and f2, but you say that f1 and f2 are not time consumptive, so it might be much better to use SIMD vectorization (e.g. System.Numerics.Vectors in C#) and run it on one core (that also reduce cache misses). Or probably you could modify your algorithm to be parallelizeable (but it might require hard work).
Without going into a code solution, you want to use some kind of barrier. This allows to check if all participans have declared they are finished with the task. Thread 2 will have to wait for thread one in this example
https://en.wikipedia.org/wiki/Barrier_(computer_science)Example of C++ "Memory barrier"