Sorting the grouped data as per group size in Pandas
For Pandas 0.17+, use sort_values
:
df.groupby('col1').size().sort_values(ascending=False)
For pre-0.17, you can use size().order()
:
df.groupby('col1').size().order(ascending=False)
You can use python's sorted:
In [11]: df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], index=['a', 'b', 'c'], columns=['A', 'B'])In [12]: g = df.groupby('A')In [13]: sorted(g, # iterates pairs of (key, corresponding subDataFrame) key=lambda x: len(x[1]), # sort by number of rows (len of subDataFrame) reverse=True) # reverse the sort i.e. largest firstOut[13]: [(1, A B a 1 2 b 1 4), (5, A B c 5 6)]
Note: as an iterator g
, iterates over pairs of the key and the corresponding subframe:
In [14]: list(g) # happens to be the same as the above...Out[14]:[(1, A B a 1 2 b 1 4, (5, A B c 5 6)]
import pandas as pddf = pd.DataFrame([[5,5],[9,7],[1,8],[1,7,],[7,8],[9,5],[5,6],[1, 2], [1, 4], [5, 6]], columns=['A', 'B']) A B 0 5 5 1 9 7 2 1 8 3 1 7 4 7 8 5 9 5 6 5 6 7 1 2 8 1 4 9 5 6 group = df.groupby('A')count = group.size()count A 1 4 5 3 7 1 9 2 dtype: int64grp_len = count[count.index.isin(count.nlargest(2).index)]grp_len A 1 4 5 3 dtype: int64