How to use Python multiprocessing Pool.map to fill numpy array in a for loop How to use Python multiprocessing Pool.map to fill numpy array in a for loop numpy numpy

How to use Python multiprocessing Pool.map to fill numpy array in a for loop


The following works. First it is a good idea to protect the main part of your code inside a main block in order to avoid weird side effects. The result of poo.map() is a list containing the evaluations for each value in the iterator list_start_vals, such that you don't have to create array_2D before.

import numpy as npfrom multiprocessing import Pooldef fill_array(start_val):    return list(range(start_val, start_val+10))if __name__=='__main__':    pool = Pool(processes=4)    list_start_vals = range(40, 60)    array_2D = np.array(pool.map(fill_array, list_start_vals))    pool.close() # ATTENTION HERE    print array_2D

perhaps you will have trouble using pool.close(), from the comments of @hpaulj you can just remove this line in case you have problems...


If you still want to use the array fill, you can use pool.apply_async instead of pool.map. Working from Saullo's answer:

import numpy as npfrom multiprocessing import Pooldef fill_array(start_val):    return range(start_val, start_val+10)if __name__=='__main__':    pool = Pool(processes=4)    list_start_vals = range(40, 60)    array_2D = np.zeros((20,10))    for line, val in enumerate(list_start_vals):        result = pool.apply_async(fill_array, [val])        array_2D[line,:] = result.get()    pool.close()    print array_2D

This runs a bit slower than the map. But it does not produce a runtime error like my test of the map version: Exception RuntimeError: RuntimeError('cannot join current thread',) in <Finalize object, dead> ignored


The problem is due to running the pool.map in for loop , The result of the map() method is functionally equivalent to the built-in map(), except that individual tasks are run parallel.so in your case the pool.map(fill_array,list_start_vals) will be called 20 times and start running parallel for each iteration of for loop , Below code should work

Code:

#!/usr/bin/pythonimport numpyfrom multiprocessing import Pooldef fill_array(start_val):    return range(start_val,start_val+10)if __name__ == "__main__":    array_2D = numpy.zeros((20,10))    pool = Pool(processes = 4)        list_start_vals = range(40,60)    # running the pool.map in a for loop is wrong    #for line in xrange(20):    #    array_2D[line,:] = pool.map(fill_array,list_start_vals)    # get the result of pool.map (list of values returned by fill_array)    # in a pool_result list     pool_result = pool.map(fill_array,list_start_vals)    # the pool is processing its inputs in parallel, close() and join()     #can be used to synchronize the main process     #with the task processes to ensure proper cleanup.    pool.close()    pool.join()    # Now assign the pool_result to your numpy    for line,result in enumerate(pool_result):        array_2D[line,:] = result    print array_2D