Python Pandas - Concat dataframes with different columns ignoring column names
If the columns are always in the same order, you can mechanically rename
the columns and the do an append
like:
Code:
new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}df_out = df_ger.append(df_uk.rename(columns=new_cols))
Test Code:
df_ger = pd.read_fwf(StringIO( u""" index Datum Zahl1 Zahl2 0 1-1-17 1 2 1 2-1-17 3 4"""), header=1).set_index('index')df_uk = pd.read_fwf(StringIO( u""" index Date No1 No2 0 1-1-17 5 6 1 2-1-17 7 8"""), header=1).set_index('index')print(df_uk)print(df_ger)new_cols = {x: y for x, y in zip(df_uk.columns, df_ger.columns)}df_out = df_ger.append(df_uk.rename(columns=new_cols))print(df_out)
Results:
Date No1 No2index 0 1-1-17 5 61 2-1-17 7 8 Datum Zahl1 Zahl2index 0 1-1-17 1 21 2-1-17 3 4 Datum Zahl1 Zahl2index 0 1-1-17 1 21 2-1-17 3 40 1-1-17 5 61 2-1-17 7 8
Provided you can be sure that the structures of the two dataframes remain the same, I see two options:
Keep the dataframe column names of the chosen default language (I assume en_GB) and just copy them over:
df_ger.columns = df_uk.columnsdf_combined = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)
This works whatever the column names are. However, technically it remains renaming.
Pull the data out of the dataframe using numpy.ndarrays, concatenate them in numpy, and make a dataframe out of it again:
np_ger_data = df_ger.as_matrix()np_uk_data = df_uk.as_matrix()np_combined_data = numpy.concatenate([np_ger_data, np_uk_data], axis=0)df_combined = pd.DataFrame(np_combined_data, columns=["Date", "No1", "No2"])
This solution requires more resources, so I would opt for the first one.
I am not sure if this will be simpler than what you had in mind, but if the main goal is for something general then this should be fine with one assumption: The columns in the two files match for example if date is the first column, the translated version will also be the first column.
# number of columnsn_columns = len(df_ger.columns)# save final columns namescolumns = df_uk.columns# rename both columns to numbersdf_ger.columns = range(n_columns)df_uk.columns = range(n_columns)# concat columnsdf_out = pd.concat([df_ger, df_uk], axis=0, ignore_index=True)# rename columns in new dataframedf_out.columns = columns