Parallel in-place sort for numpy arrays Parallel in-place sort for numpy arrays numpy numpy

Parallel in-place sort for numpy arrays


I ended up wrapping GCC parallel sort. Here is the code:

parallelSort.pyx

# cython: wraparound = False# cython: boundscheck = Falseimport numpy as npcimport numpy as npimport cythoncimport cython ctypedef fused real:    cython.char    cython.uchar    cython.short    cython.ushort    cython.int    cython.uint    cython.long    cython.ulong    cython.longlong    cython.ulonglong    cython.float    cython.doublecdef extern from "<parallel/algorithm>" namespace "__gnu_parallel":    cdef void sort[T](T first, T last) nogil def numpyParallelSort(real[:] a):    "In-place parallel sort for numpy types"    sort(&a[0], &a[a.shape[0]])

Extra compiler args: -fopenmp (compile) and -lgomp (linking)

This makefile will do it:

all:    cython --cplus parallelSort.pyx      g++  -g -march=native -Ofast -fpic -c    parallelSort.cpp -o parallelSort.o -fopenmp `python-config --includes`    g++  -g -march=native -Ofast -shared  -o parallelSort.so parallelSort.o `python-config --libs` -lgomp clean:    rm -f parallelSort.cpp *.o *.so

And this shows that it works:

from parallelSort import numpyParallelSortimport numpy as np a = np.random.random(100000000)numpyParallelSort(a) print a[:10]

edit: fixed bug noticed in the comment below


Mergesort parallelizes quite naturally. Just have each worker pre-sort an arbitrary chunk, and then run a single merge pass on it. The final merging should require only O(N) operations, and its trivial to write a function for doing so in numba or somesuch.

Wikipedia agrees