Parallelise python loop with numpy arrays and shared-memory Parallelise python loop with numpy arrays and shared-memory numpy numpy

Parallelise python loop with numpy arrays and shared-memory


With Cython parallel support:

# asd.pyxfrom cython.parallel cimport prangeimport numpy as npdef foo():    cdef int i, j, n    x = np.zeros((200, 2000), float)    n = x.shape[0]    for i in prange(n, nogil=True):        with gil:            for j in range(100):                x[i,:] = np.cos(x[i,:])    return x

On a 2-core machine:

$ cython asd.pyx$ gcc -fPIC -fopenmp -shared -o asd.so asd.c -I/usr/include/python2.7$ export OMP_NUM_THREADS=1$ time python -c 'import asd; asd.foo()'real    0m1.548suser    0m1.442ssys 0m0.061s$ export OMP_NUM_THREADS=2$ time python -c 'import asd; asd.foo()'real    0m0.602suser    0m0.826ssys 0m0.075s

This runs fine in parallel, since np.cos (like other ufuncs) releases the GIL.

If you want to use this interactively:

# asd.pyxbdldef make_ext(modname, pyxfilename):    from distutils.extension import Extension    return Extension(name=modname,                     sources=[pyxfilename],                     extra_link_args=['-fopenmp'],                     extra_compile_args=['-fopenmp'])

and (remove asd.so and asd.c first):

>>> import pyximport>>> pyximport.install(reload_support=True)>>> import asd>>> q1 = asd.foo()# Go to an editor and change asd.pyx>>> reload(asd)>>> q2 = asd.foo()

So yes, in some cases you can parallelize just by using threads. OpenMP is just a fancy wrapper for threading, and Cython is therefore only needed here for the easier syntax. Without Cython, you can use the threading module --- works similarly as multiprocessing (and probably more robustly), but you don't need to do anything special to declare arrays as shared memory.

However, not all operations release the GIL, so YMMV for the performance.

***

And another possibly useful link scraped from other Stackoverflow answers --- another interface to multiprocessing: http://packages.python.org/joblib/parallel.html


Using a mapping operation (in this case multiprocessing.Pool.map()) is more or less the the canonical way to paralellize a loop on a single machine. Unless and until the built-in map() is ever paralellized.

An overview of the different possibilities can be found here.

You can use openmp with python (or rather cython), but it doesn't look exactly easy.

IIRC, the point if only running multiprocessing stuff from __main__ is a neccesity because of compatibility with Windows. Since windows lacks fork(), it starts a new python interpreter and has to import the code in it.

Edit

Numpy can paralellize some operations like dot(), vdot() and innerproduct(), when configured with a good multithreading BLAS library like e.g. OpenBLAS. (See also this question.)

Since numpy array operations are mostly by element it seems possible to parallelize them. But this would involve setting up either a shared memory segment for python objects, or dividing the arrays up into pieces and feeding them to the different processes, not unlike what multiprocessing.Pool does. No matter what approach is taken, it would incur memory and processing overhead to manage all that. One would have to run extensive tests to see for which sizes of arrays this would actually be worth the effort. The outcome of those tests would probably vary considerable per hardware architecture, operating system and amount of RAM.


The .map( ) method of the mathDict( ) class in ParallelRegression does exactly what you are looking for in two lines of code that should be very easy at an interactive prompt. It uses true multiprocessing, so the requirement that the function to be run in parallel is pickle-able is unavoidable, but this does provide an easy way to loop over a matrix in shared memory from multiple processes.

Say you have a pickle-able function:

def sum_row( matrix, row ):    return( sum( matrix[row,:] ) )

Then you just need to create a mathDict( ) object representing it, and use mathDict( ).map( ):

matrix = np.array( [i for i in range( 24 )] ).reshape( (6, 4) )RA, MD = mathDictMaker.fromMatrix( matrix, integer=True )res = MD.map( [(i,) for i in range( 6 )], sum_row, ordered=True )print( res )# [6, 22, 38, 54, 70, 86]

The documentation (link above) explains how to pass a combination of positional and keyword arguments into your function, including the matrix itself at any position or as a keyword argument. This should enable you to use pretty much any function you've already written without modifying it.