Multiprocessing vs Threading Python [duplicate] Multiprocessing vs Threading Python [duplicate] python python

Multiprocessing vs Threading Python [duplicate]


Here are some pros/cons I came up with.

Multiprocessing

Pros

  • Separate memory space
  • Code is usually straightforward
  • Takes advantage of multiple CPUs & cores
  • Avoids GIL limitations for cPython
  • Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it's more of a communication model for IPC)
  • Child processes are interruptible/killable
  • Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
  • A must with cPython for CPU-bound processing

Cons

  • IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
  • Larger memory footprint

Threading

Pros

  • Lightweight - low memory footprint
  • Shared memory - makes access to state from another context easier
  • Allows you to easily make responsive UIs
  • cPython C extension modules that properly release the GIL will run in parallel
  • Great option for I/O-bound applications

Cons

  • cPython - subject to the GIL
  • Not interruptible/killable
  • If not following a command queue/message pump model (using the Queue module), then manual use of synchronization primitives become a necessity (decisions are needed for the granularity of locking)
  • Code is usually harder to understand and to get right - the potential for race conditions increases dramatically


The threading module uses threads, the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads.


Threading's job is to enable applications to be responsive. Suppose you have a database connection and you need to respond to user input. Without threading, if the database connection is busy the application will not be able to respond to the user. By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures - good performance, plus a flexible software design.

Note that due to the GIL the app isn't actually doing two things at once, but what we've done is put the resource lock on the database into a separate thread so that CPU time can be switched between it and the user interaction. CPU time gets rationed out between the threads.

Multiprocessing is for times when you really do want more than one thing to be done at any given time. Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset. Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel because the GIL means that you're only ever using the resources of one CPU. By putting each job in a Multiprocessing process, each can run on it's own CPU and run at full efficiency.


matomo