Pythonic and efficient way to do an elementwise "in" using numpy

python arrays numpy boolean

To take advantage of NumPy's broadcasting rules you should make array b squared first, which can be achieved using itertools.izip_longest:

from itertools import izip_longestc = np.array(list(izip_longest(*b))).astype(float)

resulting in:

array([[  1.,   2.,   5.,   7.],       [  2.,   8.,   6.,  nan],       [ 13.,   9.,  nan,  nan]])

Then, by doing np.isclose(c, a) you get a 2D array of Booleans showing the difference between each c[:, i] and a[i], according to the broadcasting rules, giving:

array([[ True,  True, False, False],       [False, False, False, False],       [False, False, False, False]], dtype=bool)

Which can be used to obtain your answer:

np.any(np.isclose(c, a), axis=0)#array([ True,  True, False, False], dtype=bool)

python arrays numpy boolean

Is there an upper limit to the length of the small lists in b? If so, maybe you could make b a matrix of say 1000x5, and use nan to fill the gaps for the sub-arrays that are too short. You can then use numpy.any to get the answer you want, something like this:

In [42]: a = np.array([1, 2, 3, 4])    ...: b = np.array([[1, 2, 13], [2, 8, 9], [5, 6], [7]])In [43]: bb = np.full((len(b), max(len(i) for i in b)), np.nan)In [44]: for irow, row in enumerate(b):    ...:     bb[irow, :len(row)] = rowIn [45]: bbOut[45]: array([[  1.,   2.,  13.],       [  2.,   8.,   9.],       [  5.,   6.,  nan],       [  7.,  nan,  nan]])In [46]: a[:,np.newaxis] == bbOut[46]: array([[ True, False, False],       [ True, False, False],       [False, False, False],       [False, False, False]], dtype=bool)In [47]: np.any(a[:,np.newaxis] == bb, axis=1)Out[47]: array([ True,  True, False, False], dtype=bool)

No idea if this is faster for your data.

python arrays numpy boolean

Summary

The approach from Sauldo Castro runs most quickly among those posted so far. The generator expression in the original post is second fastest.

Code to generate test data:

import numpyimport randomalength = 100a = numpy.array([random.randint(1, 6) for i in range(alength)])b = []for i in range(alength):    length = random.randint(1, 5)    element = []    for i in range(length):        element.append(random.randint(1, 6))    b.append(element)b = numpy.array(b)print a, b

The options:

from itertools import izip_longestdef magic_function1(a, b): # From OP Martin Fixman    return [x in y for x, y in zip(a, b)]  def magic_function2(a, b): # What I thought might be better.    bools = []    for x, y in zip(a,b):        found = False        for j in y:            if x == j:                found=True                break        bools.append(found)def magic_function3(a, b): # What I tried first    bools = []    for i in range(len(a)):        found = False        for j in range(len(b[i])):            if a[i] == b[i][j]:                found=True                break        bools.append(found)def magic_function4(a, b): # From Bas Swinkels    bb = numpy.full((len(b), max(len(i) for i in b)), numpy.nan)    for irow, row in enumerate(b):        bb[irow, :len(row)] = row    a[:,numpy.newaxis] == bb    return numpy.any(a[:,numpy.newaxis] == bb, axis=1)def magic_function5(a, b): # From Sauldo Castro, revised version    c = numpy.array(list(izip_longest(*b))).astype(float)    return numpy.isclose(c, a), axis=0)

Time n_executions

n_executions = 100clock = timeit.Timer(stmt="magic_function1(a, b)", setup="from __main__ import magic_function1, a, b")print clock.timeit(n_executions), "seconds"# Repeat with each candidate function

The results:

0.158078225475 seconds for magic_function1
0.181080926835 seconds for magic_function2
0.259621047822 seconds for magic_function3
0.287054750224 seconds for magic_function4
0.0839162196207 seconds for magic_function5

CodeHunter

Pythonic and efficient way to do an elementwise "in" using numpy

Summary

Code to generate test data:

The options:

Time n_executions

The results:

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last