Is shared readonly data copied to different processes for multiprocessing?
You can use the shared memory stuff from multiprocessing
together with Numpy fairly easily:
import multiprocessingimport ctypesimport numpy as npshared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())shared_array = shared_array.reshape(10, 10)#-- edited 2015-05-01: the assert check below checks the wrong thing# with recent versions of Numpy/multiprocessing. That no copy is made# is indicated by the fact that the program prints the output shown below.## No copy was made##assert shared_array.base.base is shared_array_base.get_obj()# Parallel processingdef my_func(i, def_param=shared_array): shared_array[i,:] = iif __name__ == '__main__': pool = multiprocessing.Pool(processes=4) pool.map(my_func, range(10)) print shared_array
which prints
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [ 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.] [ 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.] [ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.] [ 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.] [ 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.] [ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.] [ 9. 9. 9. 9. 9. 9. 9. 9. 9. 9.]]
However, Linux has copy-on-write semantics on fork()
, so even without using multiprocessing.Array
, the data will not be copied unless it is written to.
The following code works on Win7 and Mac (maybe on linux, but not tested).
import multiprocessingimport ctypesimport numpy as np#-- edited 2015-05-01: the assert check below checks the wrong thing# with recent versions of Numpy/multiprocessing. That no copy is made# is indicated by the fact that the program prints the output shown below.## No copy was made##assert shared_array.base.base is shared_array_base.get_obj()shared_array = Nonedef init(shared_array_base): global shared_array shared_array = np.ctypeslib.as_array(shared_array_base.get_obj()) shared_array = shared_array.reshape(10, 10)# Parallel processingdef my_func(i): shared_array[i, :] = iif __name__ == '__main__': shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10) pool = multiprocessing.Pool(processes=4, initializer=init, initargs=(shared_array_base,)) pool.map(my_func, range(10)) shared_array = np.ctypeslib.as_array(shared_array_base.get_obj()) shared_array = shared_array.reshape(10, 10) print shared_array
For those stuck using Windows, which does not support fork()
(unless using CygWin), pv's answer does not work. Globals are not made available to child processes.
Instead, you must pass the shared memory during the initializer of the Pool
/Process
as such:
#! /usr/bin/pythonimport timefrom multiprocessing import Process, Queue, Arraydef f(q,a): m = q.get() print m print a[0], a[1], a[2] m = q.get() print m print a[0], a[1], a[2]if __name__ == '__main__': a = Array('B', (1, 2, 3), lock=False) q = Queue() p = Process(target=f, args=(q,a)) p.start() q.put([1, 2, 3]) time.sleep(1) a[0:3] = (4, 5, 6) q.put([4, 5, 6]) p.join()
(it's not numpy and it's not good code but it illustrates the point ;-)