Optimization and speedup of a mathematical function in python
If you're willing to use http://pythran.readthedocs.io/, you can leverage on the numpy implementation and get better performance than cython for that case:
#pythran export np_cos_norm(float[], float[])import numpy as npdef np_cos_norm(a, b): val = np.sum(1. - np.cos(a-b)) return np.sqrt(val / 2. / a.shape[0])
And compile it with:
pythran fast.py
To get an average x2 over the cython version.
If using:
pythran fast.py -march=native -DUSE_BOOST_SIMD -fopenmp
You'll get a vectorized, parallel version that runs slightly faster:
100000 loops, best of 3: 2.54 µs per loop1000000 loops, best of 3: 674 ns per loop100000 loops, best of 3: 16.9 µs per loop100000 loops, best of 3: 4.31 µs per loop10000 loops, best of 3: 176 µs per loop10000 loops, best of 3: 42.9 µs per loop
(using the same testbed as ev-br)
Here's a quick-and-dirty try with cython, for just a pair of 1D arrays:
(in an IPython notebook)
%%cythoncimport cythoncimport numpy as npcdef extern from "math.h": double cos(double x) nogil double sqrt(double x) nogildef cos_norm(a, b): return cos_norm_impl(a, b)@cython.boundscheck(False)@cython.wraparound(False)@cython.cdivision(True)cdef double cos_norm_impl(double[::1] a, double[::1] b) nogil: cdef double res = 0., val cdef int m = a.shape[0] # XXX: shape of b not checked cdef int j for j in range(m): val = a[j] - b[j] res += 1. - cos(val) res /= 2.*m return sqrt(res)
Comparing with a straightforward numpy implementation,
def np_cos_norm(a, b): val = np.add.reduce(1. - np.cos(a-b)) return np.sqrt(val / 2. / a.shape[0])
I get
np.random.seed(1234)for n in [100, 1000, 10000]: x = np.random.random(n) y = np.random.random(n) %timeit cos_norm(x, y) %timeit np_cos_norm(x, y) print '\n'100000 loops, best of 3: 3.04 µs per loop100000 loops, best of 3: 12.4 µs per loop100000 loops, best of 3: 18.8 µs per loop10000 loops, best of 3: 30.8 µs per loop1000 loops, best of 3: 196 µs per loop1000 loops, best of 3: 223 µs per loop
So, depending on the dimensionality of your vectors, you can get from a factor of 4 to nil of a speedup.
For computing pairwise distances, you can probably do much better, as shown in this blog post, but of course YMMV.