Pandas join issue: columns overlap but no suffix specified Pandas join issue: columns overlap but no suffix specified python python

Pandas join issue: columns overlap but no suffix specified


Your error on the snippet of data you posted is a little cryptic, in that because there are no common values, the join operation fails because the values don't overlap it requires you to supply a suffix for the left and right hand side:

In [173]:df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')Out[173]:       mukey_left  DI  PI  mukey_right  niccdcdindex                                          0          100000  35  14          NaN      NaN1         1000005  44  14          NaN      NaN2         1000006  44  14          NaN      NaN3         1000007  43  13          NaN      NaN4         1000008  43  13          NaN      NaN

merge works because it doesn't have this restriction:

In [176]:df_a.merge(df_b, on='mukey', how='left')Out[176]:     mukey  DI  PI  niccdcd0   100000  35  14      NaN1  1000005  44  14      NaN2  1000006  44  14      NaN3  1000007  43  13      NaN4  1000008  43  13      NaN


The .join() function is using the index of the passed as argument dataset, so you should use set_index or use .merge function instead.

Please find the two examples that should work in your case:

join_df = LS_sgo.join(MSU_pi.set_index('mukey'), on='mukey', how='left')

or

join_df = df_a.merge(df_b, on='mukey', how='left')


This error indicates that the two tables have the 1 or more column names that have the same column name. The error message translates to: "I can see the same column in both tables but you haven't told me to rename either before bringing one of them in"

You either want to delete one of the columns before bringing it in from the other on using del df['column name'], or use lsuffix to re-write the original column, or rsuffix to rename the one that is being brought it.

df_a.join(df_b, on='mukey', how='left', lsuffix='_left', rsuffix='_right')