group by pandas dataframe and select latest in each group

You can also use tail with groupby to get the last n values of the group:

df.sort_values('date').groupby('id').tail(1)    id  product date2   220 6647    2014-10-168   901 4555    2014-11-015   826 3380    2015-05-19

python pandas group-by pandas-groupby

use idxmax in groupby and slice df with loc

df.loc[df.groupby('id').date.idxmax()]    id  product       date2  220     6647 2014-10-165  826     3380 2015-05-198  901     4555 2014-11-01

python pandas group-by pandas-groupby

I had a similar problem and ended up using drop_duplicates rather than groupby.

It seems to run significatively faster on large datasets when compared with other methods suggested above.

df.sort_values(by="date").drop_duplicates(subset=["id"], keep="last")    id  product        date2  220     6647  2014-10-168  901     4555  2014-11-015  826     3380  2015-05-19

CodeHunter

group by pandas dataframe and select latest in each group

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last