Using fillna with two multi-index dataframes throws InvalidIndexError
The problem here is the duplicate index defined in df1:
df1 = pd.DataFrame({ 'key1': list('ABAACCA'), 'key2': list('1675987'), 'prop1': list('xyzuynb'), 'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])
Note: Key1=A Key2=7 appears twice, the index for df1 is not unique.
Let's change that second A7 to A9
df1 = pd.DataFrame({ 'key1': list('ABAACCA'), 'key2': list('1675989'), 'prop1': list('xyzuynb'), 'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({ 'key1': list('ABCCADD'), 'key2': list('1598787'), 'prop1': [np.nan] * 7, 'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])
Thus creating unique indexing in df1, now try df.fillna:
df2.fillna(df1)
Output:
prop1 prop2key1 key2 A 1 x mB 5 NaN NaNC 9 y b 8 n aA 7 z bD 8 NaN NaN 7 NaN NaN
I got hint of this when I tried the reindex_like
method, first with unique indexing:
df1 = pd.DataFrame({ 'key1': list('ABAACCA'), 'key2': list('1675989'), 'prop1': list('xyzuynb'), 'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({ 'key1': list('ABCCADD'), 'key2': list('1598787'), 'prop1': [np.nan] * 7, 'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])print(df1.reindex_like(df2))
Output:
prop1 prop2key1 key2 A 1 x mB 5 NaN NaNC 9 y b 8 n aA 7 z bD 8 NaN NaN 7 NaN NaN
Now, let's revert to the original dataframes in the post:
df1 = pd.DataFrame({ 'key1': list('ABAACCA'), 'key2': list('1675987'), 'prop1': list('xyzuynb'), 'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({ 'key1': list('ABCCADD'), 'key2': list('1598787'), 'prop1': [np.nan] * 7, 'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])print(df1.reindex_like(df2))
Output ValueError:
ValueError: cannot handle a non-unique multi-index!
Another work-around it to create unique indexing by adding another index level with cumcount.
df1 = pd.DataFrame({ 'key1': list('ABAACCA'), 'key2': list('1675987'), 'prop1': list('xyzuynb'), 'prop2': list('mnbbbas')}).set_index(['key1', 'key2'])df2 = pd.DataFrame({ 'key1': list('ABCCADD'), 'key2': list('1598787'), 'prop1': [np.nan] * 7, 'prop2': [np.nan] * 7}).set_index(['key1', 'key2'])df1 = df1.set_index(df1.groupby(df1.index).cumcount(), append=True)df2 = df2.set_index(df2.groupby(df2.index).cumcount(), append=True)df2.fillna(df1)
Output:
prop1 prop2key1 key2 A 1 0 x mB 5 0 NaN NaNC 9 0 y b 8 0 n aA 7 0 z bD 8 0 NaN NaN 7 0 NaN NaN
Then you can drop index level 2:
df2.fillna(df1).reset_index(level=2, drop=True)
Output:
prop1 prop2key1 key2 A 1 x mB 5 NaN NaNC 9 y b 8 n aA 7 z bD 8 NaN NaN 7 NaN NaN
However, I think pandas should have nicer error messaging for fillna
non-unique MultiIndexes like it does for reindex_like
.
Here is problem some index values not match, for me working alternative solution with DataFrame.combine_first
:
df = df2.combine_first(df1)print (df) prop1 prop2key1 key2 A 1 x m 5 u b 7 z b 7 b sB 5 NaN NaN 6 y nC 8 n a 9 y bD 7 NaN NaN 8 NaN NaN