Using the multiprocessing module for cluster computing Using the multiprocessing module for cluster computing python python

Using the multiprocessing module for cluster computing


If by cluster computing you mean distributed memory systems (multiple nodes rather that SMP) then Python's multiprocessing may not be a suitable choice. It can spawn multiple processes but they will still be bound within a single node.

What you will need is a framework that handles spawing of processes across multiple nodes and provides a mechanism for communication between the processors. (pretty much what MPI does).

See the page on Parallel Processing on the Python wiki for a list of frameworks which will help with cluster computing.

From the list, pp, jug, pyro and celery look like sensible options although I can't personally vouch for any since I have no experience with any of them (I use mainly MPI).

If ease of installation/use is important, I would start by exploring jug. It's easy to install, supports common batch cluster systems, and looks well documented.


In the past I've used Pyro to do this quite successfully. If you turn on mobile code it will automatically send over the wire required modules the nodes don't have already. Pretty nifty.


I have luck using SCOOP as an alternative to multiprocessing for single or multi computer use and gain the benefit of job submission for clusters as well as many other features such as nested maps and minimal code changes to get working with map().

The source is available on Github. A quick example shows just how simple implementation can be!