Efficient elementwise argmin of matrix-vector difference
Inspired by this post
, we can leverage np.searchsorted
-
def find_closest(a, v): sidx = v.argsort() v_s = v[sidx] idx = np.searchsorted(v_s, a) idx[idx==len(v)] = len(v)-1 idx0 = (idx-1).clip(min=0) m = np.abs(a-v_s[idx]) >= np.abs(v_s[idx0]-a) m[idx==0] = 0 idx[m] -= 1 out = sidx[idx] return out
Some more perf. boost with numexpr
on large datasets :
import numexpr as nedef find_closest_v2(a, v): sidx = v.argsort() v_s = v[sidx] idx = np.searchsorted(v_s, a) idx[idx==len(v)] = len(v)-1 idx0 = (idx-1).clip(min=0) p1 = v_s[idx] p2 = v_s[idx0] m = ne.evaluate('(idx!=0) & (abs(a-p1) >= abs(p2-a))', {'p1':p1, 'p2':p2, 'idx':idx}) idx[m] -= 1 out = sidx[idx] return out
Timings
Setup :
N,M = 500,100000a = np.random.rand(N,M)v = np.random.rand(N)In [22]: %timeit find_closest_v2(a, v)4.35 s ± 21.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)In [23]: %timeit find_closest(a, v)4.69 s ± 173 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)