Removing elements from an array that are in another array

there is a easy solution with list comprehension,

A = [i for i in A if i not in B]

Result

[[1, 1, 2], [1, 1, 3]]

List comprehension it's not removing the elements from the array, It's just reassigning,

if you want to remove the elements use this method

for i in B:     if i in A:     A.remove(i)

python arrays numpy

Based on this solution to Find the row indexes of several values in a numpy array, here's a NumPy based solution with less memory footprint and could be beneficial when working with large arrays -

dims = np.maximum(B.max(0),A.max(0))+1out = A[~np.in1d(np.ravel_multi_index(A.T,dims),np.ravel_multi_index(B.T,dims))]

Sample run -

In [38]: AOut[38]: array([[1, 1, 1],       [1, 1, 2],       [1, 1, 3],       [1, 1, 4]])In [39]: BOut[39]: array([[0, 0, 0],       [1, 0, 2],       [1, 0, 3],       [1, 0, 4],       [1, 1, 0],       [1, 1, 1],       [1, 1, 4]])In [40]: outOut[40]: array([[1, 1, 2],       [1, 1, 3]])

Runtime test on large arrays -

In [107]: def in1d_approach(A,B):     ...:     dims = np.maximum(B.max(0),A.max(0))+1     ...:     return A[~np.in1d(np.ravel_multi_index(A.T,dims),\     ...:                     np.ravel_multi_index(B.T,dims))]     ...: In [108]: # Setup arrays with B as large array and A contains some of B's rows     ...: B = np.random.randint(0,9,(1000,3))     ...: A = np.random.randint(0,9,(100,3))     ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)     ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)     ...: A[A_idx] = B[B_idx]     ...:

Timings with broadcasting based solutions -

In [109]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]100 loops, best of 3: 4.64 ms per loop # @Kasramvd's solnIn [110]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]100 loops, best of 3: 3.66 ms per loop

Timing with less memory footprint based solution -

In [111]: %timeit in1d_approach(A,B)1000 loops, best of 3: 231 µs per loop

Further performance boost

in1d_approach reduces each row by considering each row as an indexing tuple. We can do the same a bit more efficiently by introducing matrix-multiplication with np.dot, like so -

def in1d_dot_approach(A,B):    cumdims = (np.maximum(A.max(),B.max())+1)**np.arange(B.shape[1])    return A[~np.in1d(A.dot(cumdims),B.dot(cumdims))]

Let's test it against the previous on much larger arrays -

In [251]: # Setup arrays with B as large array and A contains some of B's rows     ...: B = np.random.randint(0,9,(10000,3))     ...: A = np.random.randint(0,9,(1000,3))     ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)     ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)     ...: A[A_idx] = B[B_idx]     ...: In [252]: %timeit in1d_approach(A,B)1000 loops, best of 3: 1.28 ms per loopIn [253]: %timeit in1d_dot_approach(A, B)1000 loops, best of 3: 1.2 ms per loop

python arrays numpy

Here is a Numpythonic approach with broadcasting:

In [83]: A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]Out[83]: array([[1, 1, 2],       [1, 1, 3]])

Here is a timeit with other answer:

In [90]: def cal_diff(A, B):   ....:     A_rows = A.view([('', A.dtype)] * A.shape[1])   ....:     B_rows = B.view([('', B.dtype)] * B.shape[1])   ....:     return np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])   ....: In [93]: %timeit cal_diff(A, B)10000 loops, best of 3: 54.1 µs per loopIn [94]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]100000 loops, best of 3: 9.41 µs per loop# Even better with Divakar's suggestionIn [97]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]100000 loops, best of 3: 7.41 µs per loop

Well, if you are looking for a faster way you should looking for ways that reduce the number of comparisons. In this case (without considering the order) you can generate a unique number from your rows and compare the numbers which can be done with summing the items power of two.

Here is the benchmark with Divakar's in1d approach:

In [144]: def in1d_approach(A,B):   .....:         dims = np.maximum(B.max(0),A.max(0))+1   .....:         return A[~np.in1d(np.ravel_multi_index(A.T,dims),\   .....:                          np.ravel_multi_index(B.T,dims))]   .....: In [146]: %timeit in1d_approach(A, B)10000 loops, best of 3: 23.8 µs per loopIn [145]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]10000 loops, best of 3: 20.2 µs per loop

You can use np.diff to get the an order independent result:

In [194]: B=np.array([[0, 0, 0,], [1, 0, 2,], [1, 0, 3,], [1, 0, 4,], [1, 1, 0,], [1, 1, 1,], [1, 1, 4,], [4, 1, 1]])In [195]: A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]Out[195]: array([[1, 1, 2],       [1, 1, 3]])In [196]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]10000 loops, best of 3: 30.7 µs per loop

Benchmark with Divakar's setup:

In [198]: B = np.random.randint(0,9,(1000,3))In [199]: A = np.random.randint(0,9,(100,3))In [200]: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)In [201]: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)In [202]: A[A_idx] = B[B_idx]In [203]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]10000 loops, best of 3: 137 µs per loopIn [204]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]10000 loops, best of 3: 112 µs per loopIn [205]: %timeit in1d_approach(A, B)10000 loops, best of 3: 115 µs per loop

Timing with larger arrays (Divakar's solution is slightly faster):

In [231]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]1000 loops, best of 3: 1.01 ms per loopIn [232]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]1000 loops, best of 3: 880 µs per loopIn [233]:  %timeit in1d_approach(A, B)1000 loops, best of 3: 807 µs per loop

CodeHunter

Removing elements from an array that are in another array

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last