Optimization and speedup of a mathematical function in python

python numpy cython pyopencl numexpr

If you're willing to use http://pythran.readthedocs.io/, you can leverage on the numpy implementation and get better performance than cython for that case:

#pythran export np_cos_norm(float[], float[])import numpy as npdef np_cos_norm(a, b):    val = np.sum(1. - np.cos(a-b))    return np.sqrt(val / 2. / a.shape[0])

And compile it with:

pythran fast.py

To get an average x2 over the cython version.

If using:

pythran fast.py -march=native -DUSE_BOOST_SIMD -fopenmp

You'll get a vectorized, parallel version that runs slightly faster:

100000 loops, best of 3: 2.54 µs per loop1000000 loops, best of 3: 674 ns per loop100000 loops, best of 3: 16.9 µs per loop100000 loops, best of 3: 4.31 µs per loop10000 loops, best of 3: 176 µs per loop10000 loops, best of 3: 42.9 µs per loop

(using the same testbed as ev-br)

python numpy cython pyopencl numexpr

Here's a quick-and-dirty try with cython, for just a pair of 1D arrays:

(in an IPython notebook)

%%cythoncimport cythoncimport numpy as npcdef extern from "math.h":    double cos(double x) nogil    double sqrt(double x) nogildef cos_norm(a, b):    return cos_norm_impl(a, b)@cython.boundscheck(False)@cython.wraparound(False)@cython.cdivision(True)cdef double cos_norm_impl(double[::1] a, double[::1] b) nogil:    cdef double res = 0., val    cdef int m = a.shape[0]    # XXX: shape of b not checked    cdef int j    for j in range(m):        val = a[j] - b[j]        res += 1. - cos(val)    res /= 2.*m    return sqrt(res)

Comparing with a straightforward numpy implementation,

def np_cos_norm(a, b):    val = np.add.reduce(1. - np.cos(a-b))    return np.sqrt(val / 2. / a.shape[0])

I get

np.random.seed(1234)for n in [100, 1000, 10000]:    x = np.random.random(n)    y = np.random.random(n)    %timeit cos_norm(x, y)    %timeit np_cos_norm(x, y)    print '\n'100000 loops, best of 3: 3.04 µs per loop100000 loops, best of 3: 12.4 µs per loop100000 loops, best of 3: 18.8 µs per loop10000 loops, best of 3: 30.8 µs per loop1000 loops, best of 3: 196 µs per loop1000 loops, best of 3: 223 µs per loop

So, depending on the dimensionality of your vectors, you can get from a factor of 4 to nil of a speedup.

For computing pairwise distances, you can probably do much better, as shown in this blog post, but of course YMMV.

CodeHunter

Optimization and speedup of a mathematical function in python

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last