Multiprocessing or Multithreading? Multiprocessing or Multithreading? python python

Multiprocessing or Multithreading?


"I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information..."

This is only partially true.

Threads are part of a process -- threads share memory trivially. Which is as much of a problem as a help -- two threads with casual disregard for each other can overwrite memory and create serious problems.

Processes, however, share information through a lot of mechanisms. A Posix pipeline (a | b) means that process a and process b share information -- a writes it and b reads it. This works out really well for a lot things.

The operating system will assign your processes to every available core as quickly as you create them. This works out really well for a lot of things.

Stackless Python is unrelated to this discussion -- it's faster and has different thread scheduling. But I don't think threads are the best route for this.

"I think my program will need to share a lot of information."

You should resolve this first. Then, determine how to structure processes around the flow of information. A "pipeline" is very easy and natural to do; any shell will create the pipeline trivially.

A "server" is another architecture where multiple client processes get and/or put information into a central server. This is a great way to share information. You can use the WSGI reference implementation as a way to build a simple, reliable server.


  • Stackless: uses 1 cpu. "Tasklets" must yield voluntarily. The preemption option doesn't work all the time.
  • Threaded: uses 1 cpu. Native threads share time somewhat randomly after running 20-100 python opcodes.
  • Multiprocessing: uses multiple cpu

Update

Indepth Analysis

Use threaded for an easy time. However, if you call C routines that take a long time before returning, then this may not be a choice if your C routine does not release the lock.

Use multiprocessing if it is very limited by cpu power and you need maximum responsiveness.

Don't use stackless, I have had it segfault before and threads are pretty much equivalent unless you are using hundreds of them or more.


There was a good talk on multiprocessing at Pycon this year. The takeaway message was "Only use multiprocessing unless you're sure you have a problem that it will solve, that cannot be solved with threads; otherwise, use threads."

Processes have a lot of overhead, and all data to be shared among processes must be serializable (ie pickleable).

You can see the slides and video here:http://blip.tv/pycon-us-videos-2009-2010-2011/introduction-to-multiprocessing-in-python-1957019

http://us.pycon.org/2009/conference/schedule/event/31/