Executing tasks in parallel in python Executing tasks in parallel in python multithreading multithreading

Executing tasks in parallel in python


The builtin threading.Thread class offers all you need: start to start a new thread and join to wait for the end of a thread.

import threadingdef task1():    passdef task2():    passdef task3():    passdef task4():    passdef task5():    passdef task6():    passdef dep1():    t1 = threading.Thread(target=task1)    t2 = threading.Thread(target=task2)    t3 = threading.Thread(target=task3)    t1.start()    t2.start()    t3.start()    t1.join()    t2.join()    t3.join()def  dep2():    t4 = threading.Thread(target=task4)    t5 = threading.Thread(target=task5)    t4.start()    t5.start()    t4.join()    t5.join()def dep3():    d1 = threading.Thread(target=dep1)    d2 = threading.Thread(target=dep2)    d1.start()    d2.start()    d1.join()    d2.join()d3 = threading.Thread(target=dep3)d3.start()d3.join()

Alternatively to join you can use Queue.join to wait for the threads end.


If you are willing to give external libraries a shot, you can express tasks and their dependencies elegantly with Ray. This works well on a single machine, the advantage here is that parallelism and dependencies can be easier to express with Ray than with python multiprocessing and it doesn't have the GIL (global interpreter lock) problem that often prevents multithreading from working efficiently. In addition it is very easy to scale the workload up on a cluster if you need to in the future.

The solution looks like this:

import rayray.init()@ray.remotedef task1():    pass@ray.remotedef task2():    pass@ray.remotedef task3():    pass@ray.remotedef dependent1(x1, x2, x3):    pass@ray.remotedef task4():    pass@ray.remotedef task5():    pass@ray.remotedef task6():    pass@ray.remotedef dependent2(x1, x2, x3):    pass@ray.remotedef dependent3(x, y):    passid1 = task1.remote()id2 = task2.remote()id3 = task3.remote()dependent_id1 = dependent1.remote(id1, id2, id3)id4 = task4.remote()id5 = task5.remote()id6 = task6.remote()dependent_id2 = dependent2.remote(id4, id5, id6)dependent_id3 = dependent3.remote(dependent_id1, dependent_id2)ray.get(dependent_id3) # This is optional, you can get the results if the tasks return an object

You can also pass actual python objects between the tasks by using the arguments inside of the tasks and returning the results (for example saying "return value" instead of the "pass" above).

Using "pip install ray" the above code works out of the box on a single machine, and it is also easy to parallelize applications on a cluster, either in the cloud or your own custom cluster, see https://ray.readthedocs.io/en/latest/autoscaling.html and https://ray.readthedocs.io/en/latest/using-ray-on-a-cluster.html). That might come in handy if your workload grows later on.

Disclaimer: I'm one of the developers of Ray.


Look at Gevent.

Example Usage:

import geventfrom gevent import socketdef destination(jobs):    gevent.joinall(jobs, timeout=2)    print [job.value for job in jobs]def task1():    return gevent.spawn(socket.gethostbyname, 'www.google.com')def task2():    return gevent.spawn(socket.gethostbyname, 'www.example.com')def task3():    return gevent.spawn(socket.gethostbyname, 'www.python.org')jobs = []jobs.append(task1())jobs.append(task2())jobs.append(task3())destination(jobs)

Hope, this is what you have been looking for.