multiprocess or multithread? - parallelizing a simple computation for millions of iterations and storing the result in a single data structure multiprocess or multithread? - parallelizing a simple computation for millions of iterations and storing the result in a single data structure numpy numpy

multiprocess or multithread? - parallelizing a simple computation for millions of iterations and storing the result in a single data structure


First option - a Server Process

Create a Server process. It's part of the Multiprocessing package which allows parallel access to data structures. This way every process will access the data structure directly, locking other processes.

From the documentation:

Server process

A manager object returned by Manager() controls a server process whichholds Python objects and allows other processes to manipulate themusing proxies.

A manager returned by Manager() will support types list, dict,Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event,Queue, Value and Array.

Second option - Pool of workers

Create a Pool of workers, an input Queue and a result Queue.

  • The main process, acting as a producer, will feed the input queue with pairs (s1, s2).
  • Each worker process will read a pair from the input Queue, and write the result into the output Queue.
  • The main thread will read the results from the result Queue, and write them into the result dictionary.

Third option - divide to independent problems

Your data is independent: f( D[si],D[sj] ) is a secluded problem, independent of any f( D[sk],D[sl] ) . furthermore, the computation time of each pair should be fairly equal, or at least in the same order of magnitude.

Divide the task into n inputs sets, where n is the number of computation units (cores, or even computers) you have. Give each input set to a different process, and join the output.


You definitely won't get any performance increase with threading - it is an inappropriate tool for cpu-bound tasks.

So the only possible choice is multiprocessing, but since you have a big data structure, I'd suggest something like mmap (pretty low level, but builtin) or Redis (tasty and high level API, but should be installed and configured).


Have you profiled your code? Is it just calculating f that is too expensive, or storing the results in the data structure (or maybe both)?

If f is dominant, then you should make sure you can't make algorithmic improvements before you start worrying about parallelization. You might be able to get a big speed up by turning some or all of the function into a C extension, perhaps using cython. If you do go with multiprocessing, then I don't see why you need to pass the entire data structure between processes?

If storing results in the matrix is too expensive, you might speed up your code by using a more efficient data structure (like array.array or numpy.ndarray). Unless you have been very careful designing and implementing your custom matrix class, it will almost certainly be slower than those.