Sorting the grouped data as per group size in Pandas Sorting the grouped data as per group size in Pandas python-3.x python-3.x

Sorting the grouped data as per group size in Pandas


For Pandas 0.17+, use sort_values:

df.groupby('col1').size().sort_values(ascending=False)

For pre-0.17, you can use size().order():

df.groupby('col1').size().order(ascending=False)


You can use python's sorted:

In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], index=['a', 'b', 'c'], columns=['A', 'B'])In [12]: g = df.groupby('A')In [13]: sorted(g,  # iterates pairs of (key, corresponding subDataFrame)                key=lambda x: len(x[1]),  # sort by number of rows (len of subDataFrame)                reverse=True)  # reverse the sort i.e. largest firstOut[13]: [(1,    A  B     a  1  2     b  1  4), (5,    A  B     c  5  6)]

Note: as an iterator g, iterates over pairs of the key and the corresponding subframe:

In [14]: list(g)  # happens to be the same as the above...Out[14]:[(1,    A  B     a  1  2     b  1  4, (5,    A  B     c  5  6)]


import pandas as pddf = pd.DataFrame([[5,5],[9,7],[1,8],[1,7,],[7,8],[9,5],[5,6],[1, 2], [1, 4], [5, 6]], columns=['A', 'B'])  A   B  055  197  218  317  478  595  656  712  814  956    group = df.groupby('A')count = group.size()count  A  14  53  71  92    dtype: int64grp_len = count[count.index.isin(count.nlargest(2).index)]grp_len   A  14  53  dtype: int64