Copying a column from one DataFrame to another gives NaN values? Copying a column from one DataFrame to another gives NaN values? pandas pandas

Copying a column from one DataFrame to another gives NaN values?


The culprit is unalignable indexes

Your DataFrames' indexes are different (and correspondingly, the indexes for each columns), so when trying to assign a column of one DataFrame to another, pandas will try to align the indexes, and failing to do so, insert NaNs.

Consider the following examples to understand what this means:

# SetupA = pd.DataFrame(index=['a', 'b', 'c']) B = pd.DataFrame(index=['b', 'c', 'd', 'f'])                                  C = pd.DataFrame(index=[1, 2, 3])
# Example of alignable indexes - A & B (complete or partial overlap of indexes)A.index B.index      a              b       b   (overlap)      c       c   (overlap)              d              f
# Example of unalignable indexes - A & C (no overlap at all)A.index C.index      a              b              c                      1              2              3

When there are no overlaps, pandas cannot match even a single value between the two DataFrames to put in the result of the assignment, so the output is a column full of NaNs.

If you're working on an IPython notebook, you can check that this is indeed the root cause using,

df1.index.equals(df2.index)# Falsedf1.index.intersection(df2.index).empty# True

You can use any of the following solutions to fix this issue.

Solution 1: Reset both DataFrames' indexes

You may prefer this option if you didn't mean to have different indices in the first place, or if you don't particularly care about preserving the index.

# Optional, if you want a RangeIndex => [0, 1, 2, ...]# df1.index = pd.RangeIndex(len(df))# Homogenize the index values,df2.index = df1.index# Assign the columns.df2[['date', 'hour']] = df1[['date', 'hour']]

If you want to keep the existing index, but as a column, you may use reset_index() instead.


Solution 2: Assign NumPy arrays (bypass index alignment)

This solution will only work if the lengths of the two DataFrames match.

# pandas >= 0.24df2['date'] = df1['date'].to_numpy()# pandas < 0.24df2['date'] = df1['date'].values

To assign multiple columns easily, use,

df2[['date', 'hour']] = df1[['date', 'hour']].to_numpy()


Try this ?

df2['date'] = df1['date'].valuesdf2['hour'] = df1['hour'].values