Collapsing rows with NaN entries in pandas dataframe
Quick and Dirty
This works and has for a long time. However, some claim that this is a bug that may be fixed. As it is currently implemented, first
returns the first non-null element if it exists per column.
df.groupby('objectID', as_index=False).first() objectID grade OS method0 object_id_0001 AAA Mac organic1 object_id_0002 ABC Win NaN
pd.concat
pd.concat([ pd.DataFrame([d.lookup(d.notna().idxmax(), d.columns)], columns=d.columns) for _, d in df.groupby('objectID')], ignore_index=True) objectID grade OS method0 object_id_0001 AAA Mac organic1 object_id_0002 ABC Win NaN
stack
df.set_index('objectID').stack().groupby(level=[0, 1]).head(1).unstack() grade OS methodobjectID object_id_0001 AAA Mac organicobject_id_0002 ABC Win None
If by chance those are strings ('NA'
)
df.mask(df.astype(str).eq('NA')).groupby('objectID', as_index=False).first()