Find empty or NaN entry in Pandas Dataframe
np.where(pd.isnull(df))
returns the row and column indices where the value is NaN:
In [152]: import numpy as npIn [153]: import pandas as pdIn [154]: np.where(pd.isnull(df))Out[154]: (array([2, 5, 6, 6, 7, 7]), array([7, 7, 6, 7, 6, 7]))In [155]: df.iloc[2,7]Out[155]: nanIn [160]: [df.iloc[i,j] for i,j in zip(*np.where(pd.isnull(df)))]Out[160]: [nan, nan, nan, nan, nan, nan]
Finding values which are empty strings could be done with applymap:
In [182]: np.where(df.applymap(lambda x: x == ''))Out[182]: (array([5]), array([7]))
Note that using applymap
requires calling a Python function once for each cell of the DataFrame. That could be slow for a large DataFrame, so it would be better if you could arrange for all the blank cells to contain NaN instead so you could use pd.isnull
.
Try this:
df[df['column_name'] == ''].index
and for NaNs you can try:
pd.isna(df['column_name'])
Check if the columns contain Nan
using .isnull()
and check for empty strings using .eq('')
, then join the two together using the bitwise OR operator |
.
Sum along axis 0
to find columns with missing data, then sum along axis 1
to the index locations for rows with missing data.
missing_cols, missing_rows = ( (df2.isnull().sum(x) | df2.eq('').sum(x)) .loc[lambda x: x.gt(0)].index for x in (0, 1))>>> df2.loc[missing_rows, missing_cols] A2 A32 1.100355 -0.508501 6 NaN NaN7 NaN NaN