Collapsing rows with NaN entries in pandas dataframe Collapsing rows with NaN entries in pandas dataframe pandas pandas

Collapsing rows with NaN entries in pandas dataframe


Quick and Dirty

This works and has for a long time. However, some claim that this is a bug that may be fixed. As it is currently implemented, first returns the first non-null element if it exists per column.

df.groupby('objectID', as_index=False).first()         objectID grade   OS   method0  object_id_0001   AAA  Mac  organic1  object_id_0002   ABC  Win      NaN

pd.concat

pd.concat([    pd.DataFrame([d.lookup(d.notna().idxmax(), d.columns)], columns=d.columns)    for _, d in df.groupby('objectID')], ignore_index=True)         objectID grade   OS   method0  object_id_0001   AAA  Mac  organic1  object_id_0002   ABC  Win      NaN

stack

df.set_index('objectID').stack().groupby(level=[0, 1]).head(1).unstack()               grade   OS   methodobjectID                          object_id_0001   AAA  Mac  organicobject_id_0002   ABC  Win     None

If by chance those are strings ('NA')

df.mask(df.astype(str).eq('NA')).groupby('objectID', as_index=False).first()


One alternative, more mechanical way

def aggregate(s):    u = s[s.notnull()].unique()    if not u.size: return np.nan    return udf.groupby('objectID').agg(aggregate)                grade   OS      methodobjectID            object_id_0001  AAA     Mac     organicobject_id_0002  ABC     Win     NaN


This will work bfill+ drop_duplicates

df.groupby('objectID',as_index=False).bfill().drop_duplicates('objectID')Out[939]:          objectID grade   OS   method0  object_id_0001   AAA  Mac  organic3  object_id_0002   ABC  Win      NaN