Python multiprocessing design Python multiprocessing design python python

Python multiprocessing design


The current state of Python's multi-processing capabilities are not great for CPU bound processing. I fear to tell you that there is no way to make it run faster using the multiprocessing module nor is it your use of multiprocessing that is the problem.

The real problem is that Python is still bound by the rules of the GlobalInterpreterLock(GIL) (I highly suggest the slides). There have been some exciting theoretical and experimental advances on working around the GIL. Python 3.2 event contains a new GIL which solves some of the issues, but introduces others.

For now, it is faster to execute many Python process with a single serial thread than to attempt to run many threads within one process. This will allow you avoid issues of acquiring the GIL between threads (by effectively having more GILs). This however is only beneficial if the IPC overhead between your Python processes doesn't eclipse the benefits of the processing.

Eli Bendersky wrote a decent overview article on his experiences with attempting to make a CPU bound process run faster with multiprocessing.

It is worth noting that PEP 371 had the desire to 'side-step' the GIL with the introduction of the multiprocessing module (previously a non-standard packaged named pyProcessing). However the GIL still seems to play too large of a role in the Python interpreter to make it work well with CPU bound algorithms. Many different people have worked on removing/rewriting the GIL, but nothing has made enough traction to make it into a Python release.


Some of the multiprocessing examples at python.org are not very clear IMO, and it's easy to start off with a flawed design. Here's a simplistic example I made to get me started on a project:

import os, time, random, multiprocessingdef busyfunc(runseconds):    starttime = int(time.clock())    while 1:        for randcount in range(0,100):            testnum = random.randint(1, 10000000)            newnum = testnum / 3.256        newtime = int(time.clock())        if newtime - starttime > runseconds:            returndef main(arg):    print 'arg from init:', arg    print "I am " + multiprocessing.current_process().name    busyfunc(15)if __name__ == '__main__':    p = multiprocessing.Process(name = "One", target=main, args=('passed_arg1',))    p.start()    p = multiprocessing.Process(name = "Two", target=main, args=('passed_arg2',))    p.start()    p = multiprocessing.Process(name = "Three", target=main, args=('passed_arg3',))    p.start()    time.sleep(5)

This should exercise 3 processors for 15 seconds. It should be easy to modify it for more. Maybe this will help to debug your current code and ensure you are really generating multiple independent processes.

If you must share data due to RAM limitations, then I suggest this:http://docs.python.org/library/multiprocessing.html#sharing-state-between-processes


As python is not really meant to do intensive number-cunching, I typically start converting time-critical parts of a python program to C/C++ and speed things up a lot.

Also, the python multithreading is not very good. Python keeps using a global semaphore for all kinds of things. So even when you use the Threads that python offers, things won't get faster. The threads are useful for applications, where threads will typically wait for things like IO.

When making a C module, you can manually release the global semaphore when processing your data (then, of course, do not access the python values anymore).

It takes some practise using the C API, but's its clearly structured and much easier to use than, for example, the Java native API.

See 'extending and embedding' in the python documentation.

This way you can make the time critical parts in C/C++, and the slower parts with faster programming work in python...