Pandas Dataframe: Replacing NaN with row average
As commented the axis argument to fillna is NotImplemented.
df.fillna(df.mean(axis=1), axis=1)
Note: this would be critical here as you don't want to fill in your nth columns with the nth row average.
For now you'll need to iterate through:
In [11]: m = df.mean(axis=1) for i, col in enumerate(df): # using i allows for duplicate columns # inplace *may* not always work here, so IMO the next line is preferred # df.iloc[:, i].fillna(m, inplace=True) df.iloc[:, i] = df.iloc[:, i].fillna(m)In [12]: dfOut[12]: c1 c2 c30 1 4 7.01 2 5 3.52 3 6 9.0
An alternative is to fillna the transpose and then transpose, which may be more efficient...
df.T.fillna(df.mean(axis=1)).T
As an alternative, you could also use an apply
with a lambda
expression like this:
df.apply(lambda row: row.fillna(row.mean()), axis=1)
yielding also
c1 c2 c30 1.0 4.0 7.01 2.0 5.0 3.52 3.0 6.0 9.0
I'll propose an alternative that involves casting into numpy arrays. Performance wise, I think this is more efficient and probably scales better than the other proposed solutions so far.
The idea being to use an indicator matrix (df.isna().values
which is 1 if the element is N/A, 0 otherwise) and broadcast-multiplying that to the row averages.Thus, we end up with a matrix (exactly the same shape as the original df), which contains the row-average value if the original element was N/A, and 0 otherwise.
We add this matrix to the original df, making sure to fillna with 0 so that, in effect, we have filled the N/A's with the respective row averages.
# setup codedf = pd.DataFrame()df['c1'] = [1, 2, 3]df['c2'] = [4, 5, 6]df['c3'] = [7, np.nan, 9]# fillna row-wiserow_avgs = df.mean(axis=1).values.reshape(-1,1)df = df.fillna(0) + df.isna().values * row_avgsdf
giving
c1 c2 c30 1.0 4.0 7.01 2.0 5.0 3.52 3.0 6.0 9.0