pandas DataFrame: replace nan values with average of columns
You can simply use DataFrame.fillna
to fill the nan
's directly:
In [27]: df Out[27]: A B C0 -0.166919 0.979728 -0.6329551 -0.297953 -0.912674 -1.3654632 -0.120211 -0.540679 -0.6804813 NaN -2.027325 1.5335824 NaN NaN 0.4618215 -0.788073 NaN NaN6 -0.916080 -0.612343 NaN7 -0.887858 1.033826 NaN8 1.948430 1.025011 -2.9822249 0.019698 -0.795876 -0.046431In [28]: df.mean()Out[28]: A -0.151121B -0.231291C -0.530307dtype: float64In [29]: df.fillna(df.mean())Out[29]: A B C0 -0.166919 0.979728 -0.6329551 -0.297953 -0.912674 -1.3654632 -0.120211 -0.540679 -0.6804813 -0.151121 -2.027325 1.5335824 -0.151121 -0.231291 0.4618215 -0.788073 -0.231291 -0.5303076 -0.916080 -0.612343 -0.5303077 -0.887858 1.033826 -0.5303078 1.948430 1.025011 -2.9822249 0.019698 -0.795876 -0.046431
The docstring of fillna
says that value
should be a scalar or a dict, however, it seems to work with a Series
as well. If you want to pass a dict, you could use df.mean().to_dict()
.
In [16]: df = DataFrame(np.random.randn(10,3))In [17]: df.iloc[3:5,0] = np.nanIn [18]: df.iloc[4:6,1] = np.nanIn [19]: df.iloc[5:8,2] = np.nanIn [20]: dfOut[20]: 0 1 20 1.148272 0.227366 -2.3681361 -0.820823 1.071471 -0.7847132 0.157913 0.602857 0.6650343 NaN -0.985188 -0.3241364 NaN NaN 0.2385125 0.769657 NaN NaN6 0.141951 0.326064 NaN7 -1.694475 -0.523440 NaN8 0.352556 -0.551487 -1.6392989 -2.067324 -0.492617 -1.675794In [22]: df.mean()Out[22]: 0 -0.2515341 -0.0406222 -0.841219dtype: float64
Apply per-column the mean of that columns and fill
In [23]: df.apply(lambda x: x.fillna(x.mean()),axis=0)Out[23]: 0 1 20 1.148272 0.227366 -2.3681361 -0.820823 1.071471 -0.7847132 0.157913 0.602857 0.6650343 -0.251534 -0.985188 -0.3241364 -0.251534 -0.040622 0.2385125 0.769657 -0.040622 -0.8412196 0.141951 0.326064 -0.8412197 -1.694475 -0.523440 -0.8412198 0.352556 -0.551487 -1.6392989 -2.067324 -0.492617 -1.675794