How to select rows with one or more nulls from a pandas DataFrame without listing columns explicitly?
[Updated to adapt to modern pandas
, which has isnull
as a method of DataFrame
s..]
You can use isnull
and any
to build a boolean Series and use that to index into your frame:
>>> df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])>>> df.isnull() 0 1 20 False False False1 False True False2 False False True3 False False False4 False False False>>> df.isnull().any(axis=1)0 False1 True2 True3 False4 Falsedtype: bool>>> df[df.isnull().any(axis=1)] 0 1 21 0 NaN 02 0 0 NaN
[For older pandas
:]
You could use the function isnull
instead of the method:
In [56]: df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])In [57]: dfOut[57]: 0 1 20 0 1 21 0 NaN 02 0 0 NaN3 0 1 24 0 1 2In [58]: pd.isnull(df)Out[58]: 0 1 20 False False False1 False True False2 False False True3 False False False4 False False FalseIn [59]: pd.isnull(df).any(axis=1)Out[59]: 0 False1 True2 True3 False4 False
leading to the rather compact:
In [60]: df[pd.isnull(df).any(axis=1)]Out[60]: 0 1 21 0 NaN 02 0 0 NaN
If you want to filter rows by a certain number of columns with null values, you may use this:
df.iloc[df[(df.isnull().sum(axis=1) >= qty_of_nuls)].index]
So, here is the example:
Your dataframe:
>>> df = pd.DataFrame([range(4), [0, np.NaN, 0, np.NaN], [0, 0, np.NaN, 0], range(4), [np.NaN, 0, np.NaN, np.NaN]])>>> df 0 1 2 30 0.0 1.0 2.0 3.01 0.0 NaN 0.0 NaN2 0.0 0.0 NaN 0.03 0.0 1.0 2.0 3.04 NaN 0.0 NaN NaN
If you want to select the rows that have two or more columns with null value, you run the following:
>>> qty_of_nuls = 2>>> df.iloc[df[(df.isnull().sum(axis=1) >=qty_of_nuls)].index] 0 1 2 31 0.0 NaN 0.0 NaN4 NaN 0.0 NaN NaN