Filtering multiple NumPy arrays based on the intersection of one column

python arrays numpy rows intersection

Your indices in your example are sorted and unique. Assuming this is no coincidence (and this situation often arises, or can easily be enforced), the following works:

import numpy as npA = np.array([[1, 1],[2, 2],[3, 3],])B = np.array([[2, 1],[3, 2],[4, 3],[5, 4]])C = np.array([[2, 2],[3, 1],[5, 2],])I = reduce(    lambda l,r: np.intersect1d(l,r,True),    (i[:,0] for i in (A,B,C)))print A[np.searchsorted(A[:,0], I)]print B[np.searchsorted(B[:,0], I)]print C[np.searchsorted(C[:,0], I)]

and in case the first column is not in sorted order (but is still unique):

C = np.array([[9, 2],[1,6],[5, 1],[2, 5],[3, 2],])def index_by_first_column_entry(M, keys):    colkeys = M[:,0]    sorter = np.argsort(colkeys)    index = np.searchsorted(colkeys, keys, sorter = sorter)    return M[sorter[index]]print index_by_first_column_entry(C, I)

and make sure to change the true to false in

I = reduce(    lambda l,r: np.intersect1d(l,r,False),    (i[:,0] for i in (A,B,C)))

generalization to duplicate values can be made using np.unique

python arrays numpy rows intersection

One way to do this is to build an indicator array, or a hash table if you like, to indicate which integers are in all your input arrays. Then you can use boolean indexing based on this indicator array to get the subarrays. Something like this:

import numpy as np# SetupA = np.array([[1, 1],[2, 2],[3, 3],])B = np.array([[2, 1],[3, 2],[4, 3],[5, 4]])C = np.array([[2, 2],[3, 1],[5, 2],])def take_overlap(*input):    n = len(input)    maxIndex = max(array[:, 0].max() for array in input)    indicator = np.zeros(maxIndex + 1, dtype=int)    for array in input:        indicator[array[:, 0]] += 1    indicator = indicator == n    result = []    for array in input:        # Look up each integer in the indicator array        mask = indicator[array[:, 0]]        # Use boolean indexing to get the sub array        result.append(array[mask])    return resultsubA, subB, subC = take_overlap(A, B, C)

This should be quite fast and this method does not assume the elements of the input arrays are unique or sorted. However this method could take a lot of memory, and might e a bit slower, if the indexing integers are sparse, ie [1, 10, 10000], but should be close to optimal if the integers are more or less dense.

python arrays numpy rows intersection

This works but I'm not sure if it is faster than any of the other answers:

import numpy as npA = np.array([[1, 1],[2, 2],[3, 3],])B = np.array([[2, 1],[3, 2],[4, 3],[5, 4]])C = np.array([[2, 2],[3, 1],[5, 2],])a = A[:,0]b = B[:,0]c = C[:,0]ab = np.where(a[:, np.newaxis] == b[np.newaxis, :])bc = np.where(b[:, np.newaxis] == c[np.newaxis, :])ab_in_bc = np.in1d(ab[1], bc[0])bc_in_ab = np.in1d(bc[0], ab[1])arows = ab[0][ab_in_bc]brows = ab[1][ab_in_bc]crows = bc[1][bc_in_ab]anew = A[arows, :]bnew = B[brows, :]cnew = C[crows, :]print(anew)print(bnew)print(cnew)

gives:

[[2 2] [3 3]][[2 1] [3 2]][[2 2] [3 1]]

CodeHunter

Filtering multiple NumPy arrays based on the intersection of one column

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last