Getting the row index for a 2D numPy array when multiple column values are known Getting the row index for a 2D numPy array when multiple column values are known numpy numpy

Getting the row index for a 2D numPy array when multiple column values are known


In [80]: a = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])In [81]: aOut[81]: array([[1, 2, 3],       [4, 5, 6],       [7, 8, 9]])

a==2 returns a boolean numpy array, showing where the condition is True:

In [82]: a==2Out[82]: array([[False,  True, False],       [False, False, False],       [False, False, False]], dtype=bool)

You can find any columns where this is True by using np.any(...,axis=0):

In [83]: np.any(a==2,axis=0)Out[83]: array([False,  True, False], dtype=bool)In [84]: np.any(a==5,axis=0)Out[84]: array([False,  True, False], dtype=bool)

You can find where both conditions are simultaneously true by using &:

In [85]: np.any(a==2,axis=0) & np.any(a==5,axis=0)Out[85]: array([False,  True, False], dtype=bool)

Finally, you can find the index of the columns where the conditions are simultaneously True using np.where:

In [86]: np.where(np.any(a==2,axis=0) & np.any(a==5,axis=0))Out[86]: (array([1]),)


Here are ways to handle conditions on columns or rows, inspired by the Zen of Python.

In []: import thisThe Zen of Python, by Tim PetersBeautiful is better than ugly.Explicit is better than implicit....

So following the second advice:
a) conditions on column(s), applied to row(s):

In []: a= arange(12).reshape(3, 4)In []: aOut[]:array([[ 0,  1,  2,  3],       [ 4,  5,  6,  7],       [ 8,  9, 10, 11]])In []: a[2, logical_and(1== a[0, :], 5== a[1, :])]+= 12In []: aOut[]:array([[ 0,  1,  2,  3],       [ 4,  5,  6,  7],       [ 8, 21, 10, 11]])

b) conditions on row(s), applied to column(s):

In []: a= a.TIn []: aOut[]:array([[ 0,  4,  8],       [ 1,  5, 21],       [ 2,  6, 10],       [ 3,  7, 11]])In []: a[logical_and(1== a[:, 0], 5== a[:, 1]), 2]+= 12In []: aOut[]:array([[ 0,  4,  8],       [ 1,  5, 33],       [ 2,  6, 10],       [ 3,  7, 11]])

So I hope this really makes sense to allways be explicit when accessing columns and rows. Code is typically read by people with various backgrounds.


Doing

np.where(np.any(a==2,axis=0) & np.any(a==5,axis=0))

as unutbu suggested will not use the information that 2 is in the 0th column, and 5 is in the 1st. So, for a = np.array([[5, 2, 3], [2, 5, 6], [7, 8, 9]]), it will mistakenly return (array([0, 1]),)

Instead, you can use

np.where((a[0]==2) & (a[1]==5))

to get the correct result (array([1]),).

Furthermore, if you want to edit the 2nd column of that particular row, you can skip the np.where and just reference it with: a[2][(a[0]==2) & (a[1]==5)]. This will work also for assignments, for example a[2][(a[0]==2) & (a[1]==5)] = 11.