Optimizing numpy.dot with Cython Optimizing numpy.dot with Cython numpy numpy

Optimizing numpy.dot with Cython

As a general note, if you are calling numpy functions from within cython and doing little else, you generally will see only marginal gains if any at all. You generally only get massive speed-ups if you are statically typing code that makes use of an explicit for loop at the python level (not in something that is calling the Numpy C-API already).

You could try writing out the code for a dot product with all of the static typing of the counter, input numpy arrays, etc, with wraparound and boundscheck set to False, import the clib version of the sqrt function and then try to leverage the parallel for loop (prange) to make use of openmp.

You can change the expression

sim = numpy.dot(v1, v2) / (sqrt(numpy.dot(v1, v1)) * sqrt(numpy.dot(v2, v2))) 


sim = numpy.dot(v1, v2) / sqrt(numpy.dot(v1, v1) * numpy.dot(v2, v2))