Pandas - Compute z-score for all columns
Build a list from the columns and remove the column you don't want to calculate the Z score for:
In [66]:cols = list(df.columns)cols.remove('ID')df[cols]Out[66]: Age BMI Risk Factor0 6 48 19.3 41 8 43 20.9 NaN2 2 39 18.1 33 9 41 19.5 NaNIn [68]:# now iterate over the remaining columns and create a new zscore columnfor col in cols: col_zscore = col + '_zscore' df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)dfOut[68]: ID Age BMI Risk Factor Age_zscore BMI_zscore Risk_zscore \0 PT 6 48 19.3 4 -0.093250 1.569614 -0.150946 1 PT 8 43 20.9 NaN 0.652753 0.074744 1.459148 2 PT 2 39 18.1 3 -1.585258 -1.121153 -1.358517 3 PT 9 41 19.5 NaN 1.025755 -0.523205 0.050315 Factor_zscore 0 1 1 NaN 2 -1 3 NaN
Using Scipy's zscore function:
df = pd.DataFrame(np.random.randint(100, 200, size=(5, 3)), columns=['A', 'B', 'C'])df| | A | B | C ||---:|----:|----:|----:|| 0 | 163 | 163 | 159 || 1 | 120 | 153 | 181 || 2 | 130 | 199 | 108 || 3 | 108 | 188 | 157 || 4 | 109 | 171 | 119 |from scipy.stats import zscoredf.apply(zscore)| | A | B | C ||---:|----------:|----------:|----------:|| 0 | 1.83447 | -0.708023 | 0.523362 || 1 | -0.297482 | -1.30804 | 1.3342 || 2 | 0.198321 | 1.45205 | -1.35632 || 3 | -0.892446 | 0.792025 | 0.449649 || 4 | -0.842866 | -0.228007 | -0.950897 |
If not all the columns of your data frame are numeric, then you can apply the Z-score function only to the numeric columns using the select_dtypes
function:
# Note that `select_dtypes` returns a data frame. We are selecting only the columnsnumeric_cols = df.select_dtypes(include=[np.number]).columnsdf[numeric_cols].apply(zscore)| | A | B | C ||---:|----------:|----------:|----------:|| 0 | 1.83447 | -0.708023 | 0.523362 || 1 | -0.297482 | -1.30804 | 1.3342 || 2 | 0.198321 | 1.45205 | -1.35632 || 3 | -0.892446 | 0.792025 | 0.449649 || 4 | -0.842866 | -0.228007 | -0.950897 |
If you want to calculate the zscore for all of the columns, you can just use the following:
df_zscore = (df - df.mean())/df.std()