Multiprocessing or Multithreading?
"I checked it out and it looks good, but I also heard that processes, unlike threads, can't share a lot of information..."
This is only partially true.
Threads are part of a process -- threads share memory trivially. Which is as much of a problem as a help -- two threads with casual disregard for each other can overwrite memory and create serious problems.
Processes, however, share information through a lot of mechanisms. A Posix pipeline (a | b
) means that process a and process b share information -- a writes it and b reads it. This works out really well for a lot things.
The operating system will assign your processes to every available core as quickly as you create them. This works out really well for a lot of things.
Stackless Python is unrelated to this discussion -- it's faster and has different thread scheduling. But I don't think threads are the best route for this.
"I think my program will need to share a lot of information."
You should resolve this first. Then, determine how to structure processes around the flow of information. A "pipeline" is very easy and natural to do; any shell will create the pipeline trivially.
A "server" is another architecture where multiple client processes get and/or put information into a central server. This is a great way to share information. You can use the WSGI reference implementation as a way to build a simple, reliable server.
- Stackless: uses 1 cpu. "Tasklets" must yield voluntarily. The preemption option doesn't work all the time.
- Threaded: uses 1 cpu. Native threads share time somewhat randomly after running 20-100 python opcodes.
- Multiprocessing: uses multiple cpu
Update
Indepth Analysis
Use threaded for an easy time. However, if you call C routines that take a long time before returning, then this may not be a choice if your C routine does not release the lock.
Use multiprocessing if it is very limited by cpu power and you need maximum responsiveness.
Don't use stackless, I have had it segfault before and threads are pretty much equivalent unless you are using hundreds of them or more.
There was a good talk on multiprocessing at Pycon this year. The takeaway message was "Only use multiprocessing unless you're sure you have a problem that it will solve, that cannot be solved with threads; otherwise, use threads."
Processes have a lot of overhead, and all data to be shared among processes must be serializable (ie pickleable).
You can see the slides and video here:http://blip.tv/pycon-us-videos-2009-2010-2011/introduction-to-multiprocessing-in-python-1957019