Memory usage keep growing with Python's multiprocessing.pool Memory usage keep growing with Python's multiprocessing.pool python python

Memory usage keep growing with Python's multiprocessing.pool


I had memory issues recently, since I was using multiple times the multiprocessing function, so it keep spawning processes, and leaving them in memory.

Here's the solution I'm using now:

def myParallelProcess(ahugearray):    from multiprocessing import Pool    from contextlib import closing    with closing(Pool(15)) as p:        res = p.imap_unordered(simple_matching, ahugearray, 100)    return res


Simply create the pool within your loop and close it at the end of the loop withpool.close().


Use map_async instead of apply_async to avoid excessive memory usage.

For your first example, change the following two lines:

for index in range(0,100000):    pool.apply_async(worker, callback=dummy_func)

to

pool.map_async(worker, range(100000), callback=dummy_func)

It will finish in a blink before you can see its memory usage in top. Change the list to a bigger one to see the difference. But note map_async will first convert the iterable you pass to it to a list to calculate its length if it doesn't have __len__ method. If you have an iterator of a huge number of elements, you can use itertools.islice to process them in smaller chunks.

I had a memory problem in a real-life program with much more data and finally found the culprit was apply_async.

P.S., in respect of memory usage, your two examples have no obvious difference.