Numpy optimization with Numba
Unlike list.append
you should never call numpy.append
in a loop! This is because even for appending a single element the whole array needs to be copied. Because you're only interested in the unique obj
you could use a Boolean array to flag the matches found so far.
As for Numba, it works best if you write out all the loops. So for example:
@jit(nopython=True)def numba2(vec_obj, vec_ps, cos_maxsep): nps = vec_ps.shape[0] nobj = vec_obj.shape[0] dim = vec_obj.shape[1] found = np.zeros(nobj, np.bool_) for i in range(nobj): for j in range(nps): cos = 0.0 for k in range(dim): cos += vec_obj[i,k] * vec_ps[j,k] if cos > cos_maxsep: found[i] = True break return found.nonzero()
The added benefit is that we can break out of the loop over the ps
array as soon as we find a match to the current obj
.
You can gain some more speed by specializing the function for 3 dimensional spaces. Also, for some reason, passing all arrays and relevant dimensions into a helper function results in another speedup:
def numba3(vec_obj, vec_ps, cos_maxsep): nps = len(vec_ps) nobj = len(vec_obj) out = np.zeros(nobj, bool) numba3_helper(vec_obj, vec_ps, cos_maxsep, out, nps, nobj) return np.flatnonzero(out)@jit(nopython=True)def numba3_helper(vec_obj, vec_ps, cos_maxsep, out, nps, nobj): for i in range(nobj): for j in range(nps): cos = (vec_obj[i,0]*vec_ps[j,0] + vec_obj[i,1]*vec_ps[j,1] + vec_obj[i,2]*vec_ps[j,2]) if cos > cos_maxsep: out[i] = True break return out
Timings I get for 20,000 obj
and 2,000 ps
:
%timeit angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep)1 loop, best of 3: 2.99 s per loop%timeit numba2(vec_obj, vec_ps, cos_maxsep)1 loop, best of 3: 444 ms per loop%timeit numba3(vec_obj, vec_ps, cos_maxsep)10 loops, best of 3: 134 ms per loop