Python Pandas join dataframes on index Python Pandas join dataframes on index python python

Python Pandas join dataframes on index


So let's dissect this:

df_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'],index_col='Date')

OK first problem here is you have specified that the index column should be 'Date' this means that you will not have a 'Date' column anymore.

start = datetime(2010, 2, 5)end = datetime(2012, 10, 26)df_train_fly = pd.date_range(start, end, freq="W-FRI")df_train_fly = pd.DataFrame(pd.Series(df_train_fly), columns=['Date'])merged = df_train_csv.join(df_train_fly.set_index(['Date']), on = ['Date'], how = 'right', lsuffix='_x')

So the above join will not work as the error reported so in order to fix this:

# remove the index_col paramdf_train_csv = pd.read_csv('./train.csv',parse_dates=['Date'])# don't set the index on df_train_flymerged = df_train_csv.join(df_train_fly, on = ['Date'], how = 'right', lsuffix='_x')

OR don't set the 'on' param:

merged = df_train_csv.join(df_train_fly, how = 'right', lsuffix='_x')

the above will use the index of both df's to join on

You can also achieve the same result by performing a merge instead:

merged = df_train_csv.merge(df_train_fly.set_index(['Date']), left_index=True, right_index=True, how = 'right', lsuffix='_x')