Memory usage keep growing with Python's multiprocessing.pool
I had memory issues recently, since I was using multiple times the multiprocessing function, so it keep spawning processes, and leaving them in memory.
Here's the solution I'm using now:
def myParallelProcess(ahugearray): from multiprocessing import Pool from contextlib import closing with closing(Pool(15)) as p: res = p.imap_unordered(simple_matching, ahugearray, 100) return res
Simply create the pool within your loop and close it at the end of the loop withpool.close()
.
Use map_async
instead of apply_async
to avoid excessive memory usage.
For your first example, change the following two lines:
for index in range(0,100000): pool.apply_async(worker, callback=dummy_func)
to
pool.map_async(worker, range(100000), callback=dummy_func)
It will finish in a blink before you can see its memory usage in top
. Change the list to a bigger one to see the difference. But note map_async
will first convert the iterable you pass to it to a list to calculate its length if it doesn't have __len__
method. If you have an iterator of a huge number of elements, you can use itertools.islice
to process them in smaller chunks.
I had a memory problem in a real-life program with much more data and finally found the culprit was apply_async
.
P.S., in respect of memory usage, your two examples have no obvious difference.