Pandas concat yields ValueError: Plan shapes are not aligned
In case it helps, I have also hit this error when I tried to concatenate two data frames (and as of the time of writing this is the only related hit I can find on google other than the source code).
I don't know whether this answer would have solved the OP's problem (since he/she didn't post enough information), but for me, this was caused when I tried to concat
dataframe df1
with columns ['A', 'B', 'B', 'C']
(see the duplicate column headings?) with dataframe df2
with columns ['A', 'B']
. Understandably the duplication caused pandas to throw a wobbly. Change df1
to ['A', 'B', 'C']
(i.e. drop one of the duplicate columns) and everything works fine.
I recently got this message, too, and I found like user @jason and @user3805082 above that I had duplicate columns in several of the hundreds of dataframes I was trying to concat
, each with dozens of enigmatic varnames. Manually searching for duplicates was not practical.
In case anyone else has the same problem, I wrote the following function which might help out.
def duplicated_varnames(df): """Return a dict of all variable names that are duplicated in a given dataframe.""" repeat_dict = {} var_list = list(df) # list of varnames as strings for varname in var_list: # make a list of all instances of that varname test_list = [v for v in var_list if v == varname] # if more than one instance, report duplications in repeat_dict if len(test_list) > 1: repeat_dict[varname] = len(test_list) return repeat_dict
Then you can iterate over that dict to report how many duplicates there are, delete the duplicated variables, or rename them in some systematic way.
Wrote a small function to concatenate duplicated column names.Function cares about sorting if original dataframe is unsorted, the output will be a sorted one.
def concat_duplicate_columns(df): dupli = {} # populate dictionary with column names and count for duplicates for column in df.columns: dupli[column] = dupli[column] + 1 if column in dupli.keys() else 1 # rename duplicated keys with °°° number suffix for key, val in dict(dupli).items(): del dupli[key] if val > 1: for i in range(val): dupli[key+'°°°'+str(i)] = val else: dupli[key] = 1 # rename columns so that we can now access abmigous column names # sorting in dict is the same as in original table df.columns = dupli.keys() # for each duplicated column name for i in set(re.sub('°°°(.*)','',j) for j in dupli.keys() if '°°°' in j): i = str(i) # for each duplicate of a column name for k in range(dupli[i+'°°°0']-1): # concatenate values in duplicated columns df[i+'°°°0'] = df[i+'°°°0'].astype(str) + df[i+'°°°'+str(k+1)].astype(str) # Drop duplicated columns from which we have aquired data df = df.drop(i+'°°°'+str(k+1), 1) # resort column names for proper mapping df = df.reindex_axis(sorted(df.columns), axis = 1) # rename columns df.columns = sorted(set(re.sub('°°°(.*)','',i) for i in dupli.keys())) return df