Pandas - How to flatten a hierarchical index in columns Pandas - How to flatten a hierarchical index in columns python python

Pandas - How to flatten a hierarchical index in columns


I think the easiest way to do this would be to set the columns to the top level:

df.columns = df.columns.get_level_values(0)

Note: if the to level has a name you can also access it by this, rather than 0.

.

If you want to combine/join your MultiIndex into one Index (assuming you have just string entries in your columns) you could:

df.columns = [' '.join(col).strip() for col in df.columns.values]

Note: we must strip the whitespace for when there is no second index.

In [11]: [' '.join(col).strip() for col in df.columns.values]Out[11]: ['USAF', 'WBAN', 'day', 'month', 's_CD sum', 's_CL sum', 's_CNT sum', 's_PC sum', 'tempf amax', 'tempf amin', 'year']


All of the current answers on this thread must have been a bit dated. As of pandas version 0.24.0, the .to_flat_index() does what you need.

From panda's own documentation:

MultiIndex.to_flat_index()

Convert a MultiIndex to an Index of Tuples containing the level values.

A simple example from its documentation:

import pandas as pdprint(pd.__version__) # '0.23.4'index = pd.MultiIndex.from_product(        [['foo', 'bar'], ['baz', 'qux']],        names=['a', 'b'])print(index)# MultiIndex(levels=[['bar', 'foo'], ['baz', 'qux']],#           codes=[[1, 1, 0, 0], [0, 1, 0, 1]],#           names=['a', 'b'])

Applying to_flat_index():

index.to_flat_index()# Index([('foo', 'baz'), ('foo', 'qux'), ('bar', 'baz'), ('bar', 'qux')], dtype='object')

Using it to replace existing pandas column

An example of how you'd use it on dat, which is a DataFrame with a MultiIndex column:

dat = df.loc[:,['name','workshop_period','class_size']].groupby(['name','workshop_period']).describe()print(dat.columns)# MultiIndex(levels=[['class_size'], ['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max']],#            codes=[[0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6, 7]])dat.columns = dat.columns.to_flat_index()print(dat.columns)# Index([('class_size', 'count'),  ('class_size', 'mean'),#     ('class_size', 'std'),   ('class_size', 'min'),#     ('class_size', '25%'),   ('class_size', '50%'),#     ('class_size', '75%'),   ('class_size', 'max')],#  dtype='object')

Flattening and Renaming in-place

May be worth noting how you can combine that with a simple list comprehension (thanks @Skippy and @mmann1123) to join the elements so your resulting column names are simple strings separated by, for example, underscores:

dat.columns = ["_".join(a) for a in dat.columns.to_flat_index()]


pd.DataFrame(df.to_records()) # multiindex become columns and new index is integers only