how to zscore normalize pandas column with nans?
Well the pandas'
versions of mean
and std
will hand the Nan
so you could just compute that way (to get the same as scipy zscore I think you need to use ddof=0 on std
):
df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)print df a zscore0 NaN NaN1 0.0767 -1.1483292 0.4383 0.0714783 0.7866 1.2464194 0.8091 1.3223205 0.1954 -0.7479126 0.6307 0.7205127 0.6599 0.8190148 0.1065 -1.0478039 0.0508 -1.235699
You could ignore nans using isnan
.
z = a # initialise array for zscoresz[~np.isnan(a)] = zscore(a[~np.isnan(a)])pandas.DataFrame({'a':a,'Zscore':z}) Zscore a0 NaN NaN1 -1.148329 0.07672 0.071478 0.43833 1.246419 0.78664 1.322320 0.80915 -0.747912 0.19546 0.720512 0.63077 0.819014 0.65998 -1.047803 0.10659 -1.235699 0.0508
I am not sure since when this parameter exists, because I have not been working with python for long. But you can simply use the parameter nan_policy = 'omit' and nans are ignored in the calculation:
a = np.array([np.nan, 0.0767, 0.4383, 0.7866, 0.8091, 0.1954, 0.6307, 0.6599, 0.1065, 0.0508])ZScore_a = stats.zscore(a,nan_policy='omit')print(ZScore_a)[nan -1.14832945 0.07147776 1.24641928 1.3223199 -0.747911540.72051236 0.81901449 -1.0478033 -1.23569949]