pandas groupby sort within groups

You could also just do it in one go, by doing the sort first and using head to take the first 3 of each group.

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)Out[35]:    count     job source4      7   sales      E2      6   sales      C1      4   sales      B5      5  market      A8      4  market      D6      3  market      B

python sorting pandas group-by

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby('job', group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3)Out[65]:job     sourcemarket  A         5        D         4        B         3sales   E         7        C         6        B         4dtype: int64

So in one go, this looks like:

df_agg['count'].groupby('job', group_keys=False).nlargest(3)

python sorting pandas group-by

Here's other example of taking top 3 on sorted order, and sorting within the groups:

In [43]: import pandas as pd                                                                                                                                                       In [44]:  df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})In [45]: df                                                                                                                                                                        Out[45]:    count_1  count_2  name0        5      100   Foo1       10      150   Foo2       12      100  Baar3       15       25   Foo4       20      250  Baar5       25      300   Foo6       30      400  Baar7       35      500  Baar### Top 3 on sorted order:In [46]: df.groupby(["name"])["count_1"].nlargest(3)                                                                                                                               Out[46]: name   Baar  7    35      6    30      4    20Foo   5    25      3    15      1    10dtype: int64### Sorting within groups based on column "count_1":In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)Out[48]:    count_1  count_2  name0       35      500  Baar1       30      400  Baar2       20      250  Baar3       12      100  Baar4       25      300   Foo5       15       25   Foo6       10      150   Foo7        5      100   Foo

CodeHunter

pandas groupby sort within groups

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last