Python Pandas: Get index of rows which column matches certain value Python Pandas: Get index of rows which column matches certain value python python

Python Pandas: Get index of rows which column matches certain value


df.iloc[i] returns the ith row of df. i does not refer to the index label, i is a 0-based index.

In contrast, the attribute index returns actual index labels, not numeric row-indices:

df.index[df['BoolCol'] == True].tolist()

or equivalently,

df.index[df['BoolCol']].tolist()

You can see the difference quite clearly by playing with a DataFrame witha non-default index that does not equal to the row's numerical position:

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},       index=[10,20,30,40,50])In [53]: dfOut[53]:    BoolCol10    True20   False30   False40    True50    True[5 rows x 1 columns]In [54]: df.index[df['BoolCol']].tolist()Out[54]: [10, 40, 50]

If you want to use the index,

In [56]: idx = df.index[df['BoolCol']]In [57]: idxOut[57]: Int64Index([10, 40, 50], dtype='int64')

then you can select the rows using loc instead of iloc:

In [58]: df.loc[idx]Out[58]:    BoolCol10    True40    True50    True[3 rows x 1 columns]

Note that loc can also accept boolean arrays:

In [55]: df.loc[df['BoolCol']]Out[55]:    BoolCol10    True40    True50    True[3 rows x 1 columns]

If you have a boolean array, mask, and need ordinal index values, you can compute them using np.flatnonzero:

In [110]: np.flatnonzero(df['BoolCol'])Out[112]: array([0, 3, 4])

Use df.iloc to select rows by ordinal index:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]Out[113]:    BoolCol10    True40    True50    True


Can be done using numpy where() function:

import pandas as pdimport numpy as npIn [716]: df = pd.DataFrame({"gene_name": ['SLC45A1', 'NECAP2', 'CLIC4', 'ADC', 'AGBL4'] , "BoolCol": [False, True, False, True, True] },       index=list("abcde"))In [717]: dfOut[717]:   BoolCol gene_namea   False   SLC45A1b    True    NECAP2c   False     CLIC4d    True       ADCe    True     AGBL4In [718]: np.where(df["BoolCol"] == True)Out[718]: (array([1, 3, 4]),)In [719]: select_indices = list(np.where(df["BoolCol"] == True)[0])In [720]: df.iloc[select_indices]Out[720]:   BoolCol gene_nameb    True    NECAP2d    True       ADCe    True     AGBL4

Though you don't always need index for a match, but incase if you need:

In [796]: df.iloc[select_indices].indexOut[796]: Index([u'b', u'd', u'e'], dtype='object')In [797]: df.iloc[select_indices].index.tolist()Out[797]: ['b', 'd', 'e']


If you want to use your dataframe object only once, use:

df['BoolCol'].loc[lambda x: x==True].index