how to zscore normalize pandas column with nans? how to zscore normalize pandas column with nans? pandas pandas

how to zscore normalize pandas column with nans?


Well the pandas' versions of mean and std will hand the Nan so you could just compute that way (to get the same as scipy zscore I think you need to use ddof=0 on std):

df['zscore'] = (df.a - df.a.mean())/df.a.std(ddof=0)print df        a    zscore0     NaN       NaN1  0.0767 -1.1483292  0.4383  0.0714783  0.7866  1.2464194  0.8091  1.3223205  0.1954 -0.7479126  0.6307  0.7205127  0.6599  0.8190148  0.1065 -1.0478039  0.0508 -1.235699


You could ignore nans using isnan.

z = a                    # initialise array for zscoresz[~np.isnan(a)] = zscore(a[~np.isnan(a)])pandas.DataFrame({'a':a,'Zscore':z})     Zscore       a0       NaN     NaN1 -1.148329  0.07672  0.071478  0.43833  1.246419  0.78664  1.322320  0.80915 -0.747912  0.19546  0.720512  0.63077  0.819014  0.65998 -1.047803  0.10659 -1.235699  0.0508


I am not sure since when this parameter exists, because I have not been working with python for long. But you can simply use the parameter nan_policy = 'omit' and nans are ignored in the calculation:

a = np.array([np.nan,  0.0767,  0.4383,  0.7866,  0.8091,  0.1954,  0.6307, 0.6599, 0.1065,  0.0508])ZScore_a = stats.zscore(a,nan_policy='omit')print(ZScore_a)[nan -1.14832945  0.07147776  1.24641928  1.3223199  -0.747911540.72051236  0.81901449 -1.0478033  -1.23569949]