Using fillna with two multi-index dataframes throws InvalidIndexError Using fillna with two multi-index dataframes throws InvalidIndexError pandas pandas

Using fillna with two multi-index dataframes throws InvalidIndexError


The problem here is the duplicate index defined in df1:

df1 = pd.DataFrame({    'key1': list('ABAACCA'),    'key2': list('1675987'),    'prop1': list('xyzuynb'),    'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])

Note: Key1=A Key2=7 appears twice, the index for df1 is not unique.

Let's change that second A7 to A9

df1 = pd.DataFrame({    'key1': list('ABAACCA'),    'key2': list('1675989'),    'prop1': list('xyzuynb'),    'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({    'key1': list('ABCCADD'),    'key2': list('1598787'),    'prop1': [np.nan] * 7,    'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])

Thus creating unique indexing in df1, now try df.fillna:

df2.fillna(df1)

Output:

          prop1 prop2key1 key2            A    1        x     mB    5      NaN   NaNC    9        y     b     8        n     aA    7        z     bD    8      NaN   NaN     7      NaN   NaN

I got hint of this when I tried the reindex_like method, first with unique indexing:

df1 = pd.DataFrame({    'key1': list('ABAACCA'),    'key2': list('1675989'),    'prop1': list('xyzuynb'),    'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({    'key1': list('ABCCADD'),    'key2': list('1598787'),    'prop1': [np.nan] * 7,    'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])print(df1.reindex_like(df2))

Output:

          prop1 prop2key1 key2            A    1        x     mB    5      NaN   NaNC    9        y     b     8        n     aA    7        z     bD    8      NaN   NaN     7      NaN   NaN

Now, let's revert to the original dataframes in the post:

df1 = pd.DataFrame({    'key1': list('ABAACCA'),    'key2': list('1675987'),    'prop1': list('xyzuynb'),    'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({    'key1': list('ABCCADD'),    'key2': list('1598787'),    'prop1': [np.nan] * 7,    'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])print(df1.reindex_like(df2))

Output ValueError:

ValueError: cannot handle a non-unique multi-index!

Another work-around it to create unique indexing by adding another index level with cumcount.

df1 = pd.DataFrame({    'key1': list('ABAACCA'),    'key2': list('1675987'),    'prop1': list('xyzuynb'),    'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({    'key1': list('ABCCADD'),    'key2': list('1598787'),    'prop1': [np.nan] * 7,    'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])df1 = df1.set_index(df1.groupby(df1.index).cumcount(), append=True)df2 = df2.set_index(df2.groupby(df2.index).cumcount(), append=True)df2.fillna(df1)

Output:

            prop1 prop2key1 key2              A    1    0     x     mB    5    0   NaN   NaNC    9    0     y     b     8    0     n     aA    7    0     z     bD    8    0   NaN   NaN     7    0   NaN   NaN

Then you can drop index level 2:

df2.fillna(df1).reset_index(level=2, drop=True)

Output:

          prop1 prop2key1 key2            A    1        x     mB    5      NaN   NaNC    9        y     b     8        n     aA    7        z     bD    8      NaN   NaN     7      NaN   NaN

However, I think pandas should have nicer error messaging for fillna non-unique MultiIndexes like it does for reindex_like.


Here is problem some index values not match, for me working alternative solution with DataFrame.combine_first:

df = df2.combine_first(df1)print (df)          prop1 prop2key1 key2            A    1        x     m     5        u     b     7        z     b     7        b     sB    5      NaN   NaN     6        y     nC    8        n     a     9        y     bD    7      NaN   NaN     8      NaN   NaN