Perform a for-loop in parallel in Python 3.2 [duplicate]
Joblib is designed specifically to wrap around multiprocessing for the purposes of simple parallel looping. I suggest using that instead of grappling with multiprocessing directly.
The simple case looks something like this:
from joblib import Parallel, delayedParallel(n_jobs=2)(delayed(foo)(i**2) for i in range(10)) # n_jobs = number of processes
The syntax is simple once you understand it. We are using generator syntax in which delayed
is used to call function foo
with its arguments contained in the parentheses that follow.
In your case, you should either rewrite your for loop with generator syntax, or define another function (i.e. 'worker' function) to perform the operations of a single loop iteration and place that into the generator syntax of a call to Parallel.
In the later case, you would do something like:
Parallel(n_jobs=2)(delayed(foo)(parameters) for x in range(i,j))
where foo
is a function you define to handle the body of your for loop. Note that you do not want to append to a list, since Parallel is returning a list anyway.
In this case, you probably want to define a simple function to perform the calculation and get localResult
.
def getLocalResult(args): """ Do whatever you want in this func. The point is that it takes x,i,j and returns localResult """ x,i,j = args #unpack args return doSomething(x,i,j)
Now in your computation function, you just create a pool of workers and map the local results:
import multiprocessingdef computation(np=4): """ np is number of processes to fork """ p = multiprocessing.Pool(np) output = p.map(getLocalResults, [(x,i,j) for x in range(i,j)] ) return output
I've removed the global here because it's unnecessary (globals are usually unnecessary). In your calling routine you should just do output.extend(computation(np=4))
or something similar.
EDIT
Here's a "working" example of your code:
from multiprocessing import Pooldef computation(args): length, startPosition, npoints = args print(args)length = 100np=4p = Pool(processes=np)p.map(computation, [(startPosition,startPosition+length//np, length//np) for startPosition in range(0, length, length//np)])
Note that what you had didn't work because you were using an instance method as your function. multiprocessing starts new processes and sends the information between processes via pickle
, therefore, only objects which can be pickled can be used. Note that it really doesn't make sense to use an instance method anyway. Each process is a copy of the parent, so any changes to state which happen in the processes do not propagate back to the parent anyway.