group by pandas dataframe and select latest in each group group by pandas dataframe and select latest in each group python python

group by pandas dataframe and select latest in each group


You can also use tail with groupby to get the last n values of the group:

df.sort_values('date').groupby('id').tail(1)    id  product date2   220 6647    2014-10-168   901 4555    2014-11-015   826 3380    2015-05-19


use idxmax in groupby and slice df with loc

df.loc[df.groupby('id').date.idxmax()]    id  product       date2  220     6647 2014-10-165  826     3380 2015-05-198  901     4555 2014-11-01


I had a similar problem and ended up using drop_duplicates rather than groupby.

It seems to run significatively faster on large datasets when compared with other methods suggested above.

df.sort_values(by="date").drop_duplicates(subset=["id"], keep="last")    id  product        date2  220     6647  2014-10-168  901     4555  2014-11-015  826     3380  2015-05-19