create intersection from two or more 2d numpy arrays based on common value in one column
Here is one approach, I believe it should be reasonably fast. I think the first thing you want to do is count the number occurrences for each position. This function will handle that:
def count_positions(positions): positions = np.sort(positions) diff = np.ones(len(positions), 'bool') diff[:-1] = positions[1:] != positions[:-1] count = diff.nonzero()[0] count[1:] = count[1:] - count[:-1] count[0] += 1 uniqPositions = positions[diff] return uniqPositions, count
Now using the function form above you want to take only the positions that occur 3 times:
positions = np.concatenate((a['position'], b['position'], c['position']))uinqPos, count = count_positions(positions)uinqPos = uinqPos[count == 3]
We will be using search sorted so we sort a b and c:
a.sort(order='position')b.sort(order='position')c.sort(order='position')
Now we can user search sorted to find where in each array to find each of our uniqPos:
new_array = np.empty((len(uinqPos), 4))new_array[:, 0] = uinqPosindex = a['position'].searchsorted(uinqPos)new_array[:, 1] = a['score'][index]index = b['position'].searchsorted(uinqPos)new_array[:, 2] = b['score'][index]index = c['position'].searchsorted(uinqPos)new_array[:, 3] = c['score'][index]
There might be a more elegant solution using dictionaries, but I thought of this one first so I'll leave that to someone else.