Cython's prange not improving performance Cython's prange not improving performance numpy numpy

Cython's prange not improving performance


I think this the parallelization is working, but the extra overhead of the parallelization is eating up the time it would have saved. If I try with different sized arrays then I do begin to see a speed up in the parallel version

XA = np.random.random((900, 2100))XB = np.random.random((100, 2100, 90))

Here the parallel version takes ~2/3 of the time of the serial version for me, which certainly isn't the 1/4 you'd expect, but does at least show some benefit.


One improvement I can offer is to replace the code that fixes contiguity:

XB = np.asanyarray([np.ascontiguousarray(XB[:,:,i]) for i in range(n)]) 

with

XB = np.ascontiguousarray(np.transpose(XB,[2,0,1]))

This speeds up both the parallel and non-parallel functions fairly significantly (a factor of 2 with the arrays you originally gave). It does make it slightly more obvious that you're being slowed down by overhead in the prange - the serial version is actually faster for the arrays in your example.