Multi Index Sorting in Pandas Multi Index Sorting in Pandas python python

Multi Index Sorting in Pandas


When sorting by a MultiIndex you need to contain the tuple describing the column inside a list*:

In [11]: df.sort_values([('Group1', 'C')], ascending=False)Out[11]:   Group1       Group2             A  B  C      A  B  C2      5  6  9      1  0  01      1  0  3      2  5  73      7  0  2      0  3  5

* so as not to confuse pandas into thinking you want to sort first by Group1 then by C.


Note: Originally used .sort since deprecated then removed in 0.20, in favor of .sort_values.


A hack would be to change the order of the levels:

In [11]: gOut[11]:                                               SalesManufacturer Product Name Product Launch DateApple        iPad         2010-04-03              30             iPod         2001-10-23              34Samsung      Galaxy       2009-04-27              24             Galaxy Tab   2010-09-02              22In [12]: g.index = g.index.swaplevel(1, 2)

Sortlevel, which (as you've found) sorts the MultiIndex levels in order:

In [13]: g = g.sortlevel()

And swap back:

In [14]: g.index = g.index.swaplevel(1, 2)In [15]: gOut[15]:                                               SalesManufacturer Product Name Product Launch DateApple        iPod         2001-10-23              34             iPad         2010-04-03              30Samsung      Galaxy       2009-04-27              24             Galaxy Tab   2010-09-02              22

I'm of the opinion that sortlevel should not sort the remaining labels in order, so will create a github issue. :) Although it's worth mentioning the docnote about "the need for sortedness".

Note: you could avoid the first swaplevel by reordering the order of the initial groupby:

g = df.groupby(['Manufacturer', 'Product Launch Date', 'Product Name']).sum()


This one liner works for me:

In [1]: grouped.sortlevel(["Manufacturer","Product Launch Date"], sort_remaining=False)                                               SalesManufacturer Product Name Product Launch Date       Apple        iPod         2001-10-23              34             iPad         2010-04-03              30Samsung      Galaxy       2009-04-27              24             Galaxy Tab   2010-09-02              22

Note this works too:

groups.sortlevel([0,2], sort_remaining=False)

This wouldn't have worked when you originally posted over two years ago, because sortlevel by default sorted on ALL indices which mucked up your company hierarchy. sort_remaining which disables that behavior was added last year. Here's the commit link for reference: https://github.com/pydata/pandas/commit/3ad64b11e8e4bef47e3767f1d31cc26e39593277