How do I parallelize a simple Python loop? How do I parallelize a simple Python loop? python python

How do I parallelize a simple Python loop?


Using multiple threads on CPython won't give you better performance for pure-Python code due to the global interpreter lock (GIL). I suggest using the multiprocessing module instead:

pool = multiprocessing.Pool(4)out1, out2, out3 = zip(*pool.map(calc_stuff, range(0, 10 * offset, offset)))

Note that this won't work in the interactive interpreter.

To avoid the usual FUD around the GIL: There wouldn't be any advantage to using threads for this example anyway. You want to use processes here, not threads, because they avoid a whole bunch of problems.


from joblib import Parallel, delayeddef process(i):    return i * i    results = Parallel(n_jobs=2)(delayed(process)(i) for i in range(10))print(results)  # prints [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The above works beautifully on my machine (Ubuntu, package joblib was pre-installed, but can be installed via pip install joblib).

Taken from https://blog.dominodatalab.com/simple-parallelization/


Edit on Mar 31, 2021: On joblib, multiprocessing, threading and asyncio

  • joblib in the above code uses import multiprocessing under the hood (and thus multiple processes, which is typically the best way to run CPU work across cores - because of the GIL)
  • You can let joblib use multiple threads instead of multiple processes, but this (or using import threading directly) is only beneficial if the threads spend considerable time on I/O (e.g. read/write to disk, send an HTTP request). For I/O work, the GIL does not block the execution of another thread
  • Since Python 3.7, as an alternative to threading, you can parallelise work with asyncio, but the same advice applies like for import threading (though in contrast to latter, only 1 thread will be used)
  • Using multiple processes incurs overhead. You need to check yourself if the above code snippet improves your wall time. Here is another one, for which I confirmed that joblib produces better results:
import timefrom joblib import Parallel, delayeddef countdown(n):    while n>0:        n -= 1    return nt = time.time()for _ in range(20):    print(countdown(10**7), end=" ")print(time.time() - t)  # takes ~10.5 seconds on medium sized Macbook Prot = time.time()results = Parallel(n_jobs=2)(delayed(countdown)(10**7) for _ in range(20))print(results)print(time.time() - t)# takes ~6.3 seconds on medium sized Macbook Pro


To parallelize a simple for loop, joblib brings a lot of value to raw use of multiprocessing. Not only the short syntax, but also things like transparent bunching of iterations when they are very fast (to remove the overhead) or capturing of the traceback of the child process, to have better error reporting.

Disclaimer: I am the original author of joblib.