Find indices of common values in two arrays Find indices of common values in two arrays numpy numpy

Find indices of common values in two arrays


np.unique and np.searchsorted could be used together to solve it -

def unq_searchsorted(A,B):    # Get unique elements of A and B and the indices based on the uniqueness    unqA,idx1 = np.unique(A,return_inverse=True)    unqB,idx2 = np.unique(B,return_inverse=True)    # Create mask equivalent to np.in1d(A,B) and np.in1d(B,A) for unique elements    mask1 = (np.searchsorted(unqB,unqA,'right') - np.searchsorted(unqB,unqA,'left'))==1    mask2 = (np.searchsorted(unqA,unqB,'right') - np.searchsorted(unqA,unqB,'left'))==1    # Map back to all non-unique indices to get equivalent of np.in1d(A,B),     # np.in1d(B,A) results for non-unique elements    return mask1[idx1],mask2[idx2]

Runtime tests and verify results -

In [233]: def org_app(A,B):     ...:     return np.in1d(A,B), np.in1d(B,A)     ...: In [234]: A = np.random.randint(0,10000,(10000))     ...: B = np.random.randint(0,10000,(10000))     ...: In [235]: np.allclose(org_app(A,B)[0],unq_searchsorted(A,B)[0])Out[235]: TrueIn [236]: np.allclose(org_app(A,B)[1],unq_searchsorted(A,B)[1])Out[236]: TrueIn [237]: %timeit org_app(A,B)100 loops, best of 3: 7.69 ms per loopIn [238]: %timeit unq_searchsorted(A,B)100 loops, best of 3: 5.56 ms per loop

If the two input arrays are already sorted and unique, the performance boost would be substantial. Thus, the solution function would simplify to -

def unq_searchsorted_v1(A,B):    out1 = (np.searchsorted(B,A,'right') - np.searchsorted(B,A,'left'))==1    out2 = (np.searchsorted(A,B,'right') - np.searchsorted(A,B,'left'))==1      return out1,out2

Subsequent runtime tests -

In [275]: A = np.random.randint(0,100000,(20000))     ...: B = np.random.randint(0,100000,(20000))     ...: A = np.unique(A)     ...: B = np.unique(B)     ...: In [276]: np.allclose(org_app(A,B)[0],unq_searchsorted_v1(A,B)[0])Out[276]: TrueIn [277]: np.allclose(org_app(A,B)[1],unq_searchsorted_v1(A,B)[1])Out[277]: TrueIn [278]: %timeit org_app(A,B)100 loops, best of 3: 8.83 ms per loopIn [279]: %timeit unq_searchsorted_v1(A,B)100 loops, best of 3: 4.94 ms per loop


A simple multiprocessing implementation will get you a little more speed:

import timeimport numpy as npfrom multiprocessing import Process, Queuea = np.random.randint(0, 20, 1000000)b = np.random.randint(0, 20, 1000000)def original(a, b, q):    q.put( np.in1d(a, b) )if __name__ == '__main__':    t0 = time.time()    q = Queue()    q2 = Queue()    p = Process(target=original, args=(a, b, q,))    p2 = Process(target=original, args=(b, a, q2))    p.start()    p2.start()    res = q.get()    res2 = q2.get()    print time.time() - t0>>> 0.21398806572 

Divakar's unq_searchsorted(A,B) method took 0.271834135056 seconds on my machine.