Column order in pandas.concat Column order in pandas.concat python python

Column order in pandas.concat


You are creating DataFrames out of dictionaries. Dictionaries are a unordered which means the keys do not have a specific order. So

d1 = {'key_a': 'val_a', 'key_b': 'val_b'}

and

d2 = {'key_b': 'val_b', 'key_a': 'val_a'}

are (probably) the same.

In addition to that I assume that pandas sorts the dictionary's keys descending by default (unfortunately I did not find any hint in the docs in order to prove that assumption) leading to the behavior you encountered.

So the basic motivation would be to resort / reorder the columns in your DataFrame. You can do this as follows:

import pandas as pddata1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})frames = [data1, data2]data = pd.concat(frames)print(data)cols = ['b' , 'a']data = data[cols]print(data)


Starting from version 0.23.0, you can prevent the concat() method to sort the returned DataFrame. For example:

df1 = pd.DataFrame({ 'a' : [1, 1, 1], 'b' : [2, 2, 2]})df2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})df = pd.concat([df1, df2], sort=False)

A future version of pandas will change to not sort by default.


def concat_ordered_columns(frames):    columns_ordered = []    for frame in frames:        columns_ordered.extend(x for x in frame.columns if x not in columns_ordered)    final_df = pd.concat(frames)        return final_df[columns_ordered]       # Usagedfs = [df_a,df_b,df_c]full_df = concat_ordered_columns(dfs)

This should work.