Optimization and speedup of a mathematical function in python Optimization and speedup of a mathematical function in python numpy numpy

Optimization and speedup of a mathematical function in python


If you're willing to use http://pythran.readthedocs.io/, you can leverage on the numpy implementation and get better performance than cython for that case:

#pythran export np_cos_norm(float[], float[])import numpy as npdef np_cos_norm(a, b):    val = np.sum(1. - np.cos(a-b))    return np.sqrt(val / 2. / a.shape[0])

And compile it with:

pythran fast.py

To get an average x2 over the cython version.

If using:

pythran fast.py -march=native -DUSE_BOOST_SIMD -fopenmp

You'll get a vectorized, parallel version that runs slightly faster:

100000 loops, best of 3: 2.54 µs per loop1000000 loops, best of 3: 674 ns per loop100000 loops, best of 3: 16.9 µs per loop100000 loops, best of 3: 4.31 µs per loop10000 loops, best of 3: 176 µs per loop10000 loops, best of 3: 42.9 µs per loop

(using the same testbed as ev-br)


Here's a quick-and-dirty try with cython, for just a pair of 1D arrays:

(in an IPython notebook)

%%cythoncimport cythoncimport numpy as npcdef extern from "math.h":    double cos(double x) nogil    double sqrt(double x) nogildef cos_norm(a, b):    return cos_norm_impl(a, b)@cython.boundscheck(False)@cython.wraparound(False)@cython.cdivision(True)cdef double cos_norm_impl(double[::1] a, double[::1] b) nogil:    cdef double res = 0., val    cdef int m = a.shape[0]    # XXX: shape of b not checked    cdef int j    for j in range(m):        val = a[j] - b[j]        res += 1. - cos(val)    res /= 2.*m    return sqrt(res)

Comparing with a straightforward numpy implementation,

def np_cos_norm(a, b):    val = np.add.reduce(1. - np.cos(a-b))    return np.sqrt(val / 2. / a.shape[0])

I get

np.random.seed(1234)for n in [100, 1000, 10000]:    x = np.random.random(n)    y = np.random.random(n)    %timeit cos_norm(x, y)    %timeit np_cos_norm(x, y)    print '\n'100000 loops, best of 3: 3.04 µs per loop100000 loops, best of 3: 12.4 µs per loop100000 loops, best of 3: 18.8 µs per loop10000 loops, best of 3: 30.8 µs per loop1000 loops, best of 3: 196 µs per loop1000 loops, best of 3: 223 µs per loop

So, depending on the dimensionality of your vectors, you can get from a factor of 4 to nil of a speedup.

For computing pairwise distances, you can probably do much better, as shown in this blog post, but of course YMMV.