Keep other columns when doing groupby Keep other columns when doing groupby python python

Keep other columns when doing groupby


Method #1: use idxmin() to get the indices of the elements of minimum diff, and then select those:

>>> df.loc[df.groupby("item")["diff"].idxmin()]   item  diff  otherstuff1     1     1           26     2    -6           27     3     0           0[3 rows x 3 columns]

Method #2: sort by diff, and then take the first element in each item group:

>>> df.sort_values("diff").groupby("item", as_index=False).first()   item  diff  otherstuff0     1     1           21     2    -6           22     3     0           0[3 rows x 3 columns]

Note that the resulting indices are different even though the row content is the same.


You can use DataFrame.sort_values with DataFrame.drop_duplicates:

df = df.sort_values(by='diff').drop_duplicates(subset='item')print (df)   item  diff  otherstuff6     2    -6           27     3     0           01     1     1           2

If possible multiple minimal values per groups and want all min rows use boolean indexing with transform for minimal values per groups:

print (df)   item  diff  otherstuff0     1     2           11     1     1           2 <-multiple min2     1     1           7 <-multiple min3     2    -1           04     2     1           35     2     4           96     2    -6           27     3     0           08     3     2           9print (df.groupby("item")["diff"].transform('min'))0    11    12    13   -64   -65   -66   -67    08    0Name: diff, dtype: int64df = df[df.groupby("item")["diff"].transform('min') == df['diff']]print (df)   item  diff  otherstuff1     1     1           22     1     1           76     2    -6           27     3     0           0


The above answer worked great if there is / you want one min. In my case there could be multiple mins and I wanted all rows equal to min which .idxmin() doesn't give you. This worked

def filter_group(dfg, col):    return dfg[dfg[col] == dfg[col].min()]df = pd.DataFrame({'g': ['a'] * 6 + ['b'] * 6, 'v1': (list(range(3)) + list(range(3))) * 2, 'v2': range(12)})df.groupby('g',group_keys=False).apply(lambda x: filter_group(x,'v1'))

As an aside, .filter() is also relevant to this question but didn't work for me.