Is shared readonly data copied to different processes for multiprocessing? Is shared readonly data copied to different processes for multiprocessing? numpy numpy

Is shared readonly data copied to different processes for multiprocessing?


You can use the shared memory stuff from multiprocessing together with Numpy fairly easily:

import multiprocessingimport ctypesimport numpy as npshared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())shared_array = shared_array.reshape(10, 10)#-- edited 2015-05-01: the assert check below checks the wrong thing#   with recent versions of Numpy/multiprocessing. That no copy is made#   is indicated by the fact that the program prints the output shown below.## No copy was made##assert shared_array.base.base is shared_array_base.get_obj()# Parallel processingdef my_func(i, def_param=shared_array):    shared_array[i,:] = iif __name__ == '__main__':    pool = multiprocessing.Pool(processes=4)    pool.map(my_func, range(10))    print shared_array

which prints

[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.] [ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.] [ 2.  2.  2.  2.  2.  2.  2.  2.  2.  2.] [ 3.  3.  3.  3.  3.  3.  3.  3.  3.  3.] [ 4.  4.  4.  4.  4.  4.  4.  4.  4.  4.] [ 5.  5.  5.  5.  5.  5.  5.  5.  5.  5.] [ 6.  6.  6.  6.  6.  6.  6.  6.  6.  6.] [ 7.  7.  7.  7.  7.  7.  7.  7.  7.  7.] [ 8.  8.  8.  8.  8.  8.  8.  8.  8.  8.] [ 9.  9.  9.  9.  9.  9.  9.  9.  9.  9.]]

However, Linux has copy-on-write semantics on fork(), so even without using multiprocessing.Array, the data will not be copied unless it is written to.


The following code works on Win7 and Mac (maybe on linux, but not tested).

import multiprocessingimport ctypesimport numpy as np#-- edited 2015-05-01: the assert check below checks the wrong thing#   with recent versions of Numpy/multiprocessing. That no copy is made#   is indicated by the fact that the program prints the output shown below.## No copy was made##assert shared_array.base.base is shared_array_base.get_obj()shared_array = Nonedef init(shared_array_base):    global shared_array    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())    shared_array = shared_array.reshape(10, 10)# Parallel processingdef my_func(i):    shared_array[i, :] = iif __name__ == '__main__':    shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)    pool = multiprocessing.Pool(processes=4, initializer=init, initargs=(shared_array_base,))    pool.map(my_func, range(10))    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())    shared_array = shared_array.reshape(10, 10)    print shared_array


For those stuck using Windows, which does not support fork() (unless using CygWin), pv's answer does not work. Globals are not made available to child processes.

Instead, you must pass the shared memory during the initializer of the Pool/Process as such:

#! /usr/bin/pythonimport timefrom multiprocessing import Process, Queue, Arraydef f(q,a):    m = q.get()    print m    print a[0], a[1], a[2]    m = q.get()    print m    print a[0], a[1], a[2]if __name__ == '__main__':    a = Array('B', (1, 2, 3), lock=False)    q = Queue()    p = Process(target=f, args=(q,a))    p.start()    q.put([1, 2, 3])    time.sleep(1)    a[0:3] = (4, 5, 6)    q.put([4, 5, 6])    p.join()

(it's not numpy and it's not good code but it illustrates the point ;-)