Keep other columns when doing groupby
Method #1: use idxmin()
to get the indices of the elements of minimum diff
, and then select those:
>>> df.loc[df.groupby("item")["diff"].idxmin()] item diff otherstuff1 1 1 26 2 -6 27 3 0 0[3 rows x 3 columns]
Method #2: sort by diff
, and then take the first element in each item
group:
>>> df.sort_values("diff").groupby("item", as_index=False).first() item diff otherstuff0 1 1 21 2 -6 22 3 0 0[3 rows x 3 columns]
Note that the resulting indices are different even though the row content is the same.
You can use DataFrame.sort_values
with DataFrame.drop_duplicates
:
df = df.sort_values(by='diff').drop_duplicates(subset='item')print (df) item diff otherstuff6 2 -6 27 3 0 01 1 1 2
If possible multiple minimal values per groups and want all min rows use boolean indexing
with transform
for minimal values per groups:
print (df) item diff otherstuff0 1 2 11 1 1 2 <-multiple min2 1 1 7 <-multiple min3 2 -1 04 2 1 35 2 4 96 2 -6 27 3 0 08 3 2 9print (df.groupby("item")["diff"].transform('min'))0 11 12 13 -64 -65 -66 -67 08 0Name: diff, dtype: int64df = df[df.groupby("item")["diff"].transform('min') == df['diff']]print (df) item diff otherstuff1 1 1 22 1 1 76 2 -6 27 3 0 0
The above answer worked great if there is / you want one min. In my case there could be multiple mins and I wanted all rows equal to min which .idxmin()
doesn't give you. This worked
def filter_group(dfg, col): return dfg[dfg[col] == dfg[col].min()]df = pd.DataFrame({'g': ['a'] * 6 + ['b'] * 6, 'v1': (list(range(3)) + list(range(3))) * 2, 'v2': range(12)})df.groupby('g',group_keys=False).apply(lambda x: filter_group(x,'v1'))
As an aside, .filter() is also relevant to this question but didn't work for me.