Parallelism in Python Parallelism in Python multithreading multithreading

Parallelism in Python


Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.

Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.

You can write your performance critical code in C or Cython, and use Python for the glue.


The new (2.6) multiprocessing module is the way to go. It uses subprocesses, which gets around the GIL problem. It also abstracts away some of the local/remote issues, so the choice of running your code locally or spread out over a cluster can be made later. The documentation I've linked above is a fair bit to chew on, but should provide a good basis to get started.


Ray is an elegant (and fast) library for doing this.

The most basic strategy for parallelizing Python functions is to declare a function with the @ray.remote decorator. Then it can be invoked asynchronously.

import rayimport time# Start the Ray processes (e.g., a scheduler and shared-memory object store).ray.init(num_cpus=8)@ray.remotedef f():    time.sleep(1)# This should take one second assuming you have at least 4 cores.ray.get([f.remote() for _ in range(4)])

You can also parallelize stateful computation using actors, again by using the @ray.remote decorator.

# This assumes you already ran 'import ray' and 'ray.init()'.import time@ray.remoteclass Counter(object):    def __init__(self):        self.x = 0    def inc(self):        self.x += 1    def get_counter(self):        return self.x# Create two actors which will operate in parallel.counter1 = Counter.remote()counter2 = Counter.remote()@ray.remotedef update_counters(counter1, counter2):    for _ in range(1000):        time.sleep(0.25)        counter1.inc.remote()        counter2.inc.remote()# Start three tasks that update the counters in the background also in parallel.update_counters.remote(counter1, counter2)update_counters.remote(counter1, counter2)update_counters.remote(counter1, counter2)# Check the counter values.for _ in range(5):    counter1_val = ray.get(counter1.get_counter.remote())    counter2_val = ray.get(counter2.get_counter.remote())    print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))    time.sleep(1)

It has a number of advantages over the multiprocessing module:

Ray is a framework I've been helping develop.