How to query MultiIndex index columns values in pandas

python pandas indexing slice multi-index

To query the df by the MultiIndex values, for example where (A > 1.7) and (B < 666):

In [536]: result_df = df.loc[(df.index.get_level_values('A') > 1.7) & (df.index.get_level_values('B') < 666)]In [537]: result_dfOut[537]:           CA   B      3.3 222  43    333  595.5 333  56

Hence, to get for example the 'A' index values, if still required:

In [538]: result_df.index.get_level_values('A')Out[538]: Index([3.3, 3.3, 5.5], dtype=object)

The problem is, that in large data frames the performance of by index selection worse by 10% than the sorted regular rows selection. And in repetitive work, looping, the delay accumulated. See example:

In [558]: df = store.select(STORE_EXTENT_BURSTS_DF_KEY)In [559]: len(df)Out[559]: 12857In [560]: df.sort(inplace=True)In [561]: df_without_index = df.reset_index()In [562]: %timeit df.loc[(df.index.get_level_values('END_TIME') > 358200) & (df.index.get_level_values('START_TIME') < 361680)]1000 loops, best of 3: 562 µs per loopIn [563]: %timeit df_without_index[(df_without_index.END_TIME > 358200) & (df_without_index.START_TIME < 361680)]1000 loops, best of 3: 507 µs per loop

python pandas indexing slice multi-index

For better readability, we can simply use the query() Method, to avoid the lengthy df.index.get_level_values() and reset_index/set_index to and fro.

Here is the target DataFrame:

In [12]: df                                                                    Out[12]:                                                                                 C                                                                    A   B                                                                          1.1 111  68                                                                        222  40                                                                    3.3 222  20                                                                        333  11                                                                    5.5 333  80                                                                    6.6 777  51

Answer for Q1 (A in range [3.3, 6.6]):

In [13]: df.query('3.3 <= A <= 6.6') # for closed interval                       Out[13]:                                                                                 C                                                                    A   B                                                                          3.3 222  20                                                                        333  11                                                                    5.5 333  80                                                                    6.6 777  51                                                                    In [14]: df.query('3.3 < A < 6.6') # for open interval                         Out[14]:                                                                                 C                                                                    A   B                                                                          5.5 333  80

and of course one can play around with <, <=, >, >= for any kind of inclusion.

Similarly, answer for Q2 (A in range [2.0, 4.0]):

In [15]: df.query('2.0 <= A <= 4.0')                                        Out[15]:                                                                              C                                                                 A   B                                                                       3.3 222  20                                                                     333  11

Answer for Q3 (B in range [111, 500]):

In [16]: df.query('111 <= B <= 500')                                        Out[16]:                                                                              C                                                                 A   B                                                                       1.1 111  68                                                                     222  40                                                                 3.3 222  20                                                                     333  11                                                                 5.5 333  80

And moreover, you can COMBINE the query for col A and B very naturally!

In [17]: df.query('0 < A < 4 and 150 < B < 400')                            Out[17]:                                                                              C                                                                 A   B                                                                       1.1 222  40                                                                 3.3 222  20                                                                     333  11

python pandas indexing slice multi-index

With a 'float' like index you always want to use it as a column rather than a direct indexing action. These will all work whether the endpoints exist or not.

In [11]: dfOut[11]:           CA   B      1.1 111  81    222  453.3 222  98    333  135.5 333  896.6 777  98In [12]: x = df.reset_index()

In [13]: x.loc[(x.A>=3.3)&(x.A<=6.6)]Out[13]:      A    B   C2  3.3  222  983  3.3  333  134  5.5  333  895  6.6  777  98

In [14]: x.loc[(x.A>=2.0)&(x.A<=4.0)]Out[14]:      A    B   C2  3.3  222  983  3.3  333  13

In [15]: x.loc[(x.B>=111.0)&(x.B<=500.0)]Out[15]:      A    B   C0  1.1  111  811  1.1  222  452  3.3  222  983  3.3  333  134  5.5  333  89

If you want the indices back, just set them. This is a cheap operation.

In [16]: x.loc[(x.B>=111.0)&(x.B<=500.0)].set_index(['A','B'])Out[16]:           CA   B      1.1 111  81    222  453.3 222  98    333  135.5 333  89

If you REALLY want the actual index values

In [5]: x.loc[(x.B>=111.0)&(x.B<=500.0)].set_index(['A','B']).indexOut[5]: MultiIndex[(1.1, 111), (1.1, 222), (3.3, 222), (3.3, 333), (5.5, 333)]

CodeHunter

How to query MultiIndex index columns values in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last