Numba and Cython aren't improving the performance compared to CPython significantly, maybe I am using it incorrectly?
Here's what I think is happening with Numba:
Numba works on Numpy
arrays. Nothing else. Everything else has nothing to do with Numba
.
zip
returns an iterator of arbitrary items, which Numba cannot see into. Thus Numba cannot do much compiling.
Looping over the indexes with a for i in range(...)
is likely to produce a much better result and allow much stronger type inference.
Using the builtin sum() could be causing problems.
Here's linear regression code that will run faster in Numba:
@numba.jitdef ols(x, y): """Simple OLS for two data sets.""" M = x.size x_sum = 0. y_sum = 0. x_sq_sum = 0. x_y_sum = 0. for i in range(M): x_sum += x[i] y_sum += y[i] x_sq_sum += x[i] ** 2 x_y_sum += x[i] * y[i] slope = (M * x_y_sum - x_sum * y_sum) / (M * x_sq_sum - x_sum**2) intercept = (y_sum - slope * x_sum) / M return slope, intercept