numpy search array for multiple values, and returns their indices numpy search array for multiple values, and returns their indices numpy numpy

numpy search array for multiple values, and returns their indices


A classic way of checking one array against another is adjust the shape and use '==':

In [250]: arr==query[:,None]Out[250]: array([[False, False, False, False, False,  True],       [False,  True, False, False, False, False],       [ True, False, False, False, False, False]], dtype=bool)In [251]: np.where(arr==query[:,None])Out[251]: (array([0, 1, 2]), array([5, 1, 0]))

If an element query isn't found in a, its 'row' will be missing, e.g. [0,2] instead of [0,1,2]

In [261]: np.where(arr==np.array(['a','x','v'],dtype='S')[:,None])Out[261]: (array([0, 2]), array([5, 1]))   

For this small example, it is considerably faster than a list comprehension equivalent:

np.hstack([(arr==i).nonzero()[0] for i in query])

It's a little slower than the searchsorted solution. (In that solution i is out of bounds if query element is not found).


Stefano suggested fromiter. It saves some time compared to hstack of a list:

In [313]: timeit np.hstack([(arr==i).nonzero()[0] for i in query])10000 loops, best of 3: 49.5 us per loopIn [314]: timeit np.fromiter(((arr==i).nonzero()[0] for i in query), dtype=int, count=len(query))10000 loops, best of 3: 35.3 us per loop

But if raises an error is an element is missing, or if there are multiple occurances. hstack can handle variable length entries, fromiter cannot.

np.flatnonzero(arr==i) is slower than ().nonzero()[0], but I haven't looked into why.


You can use np.searchsorted on the sorted array, then revert the returned indices to the original array. For that you may use np.argsort; as in:

>>> indx = a.argsort()  # indices that would sort the array>>> i = np.searchsorted(a[indx], query)  # indices in the sorted array>>> indx[i]  # indices with respect to the original arrayarray([5, 1, 0])

if a is of size n and query is of size k, this will be O(n log n + k log n) which would be faster than O(n k) for linear search if log n < k.