group by pandas dataframe and select latest in each group
You can also use tail
with groupby to get the last n values of the group:
df.sort_values('date').groupby('id').tail(1) id product date2 220 6647 2014-10-168 901 4555 2014-11-015 826 3380 2015-05-19
use idxmax
in groupby
and slice df
with loc
df.loc[df.groupby('id').date.idxmax()] id product date2 220 6647 2014-10-165 826 3380 2015-05-198 901 4555 2014-11-01
I had a similar problem and ended up using drop_duplicates
rather than groupby
.
It seems to run significatively faster on large datasets when compared with other methods suggested above.
df.sort_values(by="date").drop_duplicates(subset=["id"], keep="last") id product date2 220 6647 2014-10-168 901 4555 2014-11-015 826 3380 2015-05-19