Pandas - Compute z-score for all columns Pandas - Compute z-score for all columns pandas pandas

Pandas - Compute z-score for all columns


Build a list from the columns and remove the column you don't want to calculate the Z score for:

In [66]:cols = list(df.columns)cols.remove('ID')df[cols]Out[66]:   Age  BMI  Risk  Factor0    6   48  19.3       41    8   43  20.9     NaN2    2   39  18.1       33    9   41  19.5     NaNIn [68]:# now iterate over the remaining columns and create a new zscore columnfor col in cols:    col_zscore = col + '_zscore'    df[col_zscore] = (df[col] - df[col].mean())/df[col].std(ddof=0)dfOut[68]:   ID  Age  BMI  Risk  Factor  Age_zscore  BMI_zscore  Risk_zscore  \0  PT    6   48  19.3       4   -0.093250    1.569614    -0.150946   1  PT    8   43  20.9     NaN    0.652753    0.074744     1.459148   2  PT    2   39  18.1       3   -1.585258   -1.121153    -1.358517   3  PT    9   41  19.5     NaN    1.025755   -0.523205     0.050315      Factor_zscore  0              1  1            NaN  2             -1  3            NaN  


Using Scipy's zscore function:

df = pd.DataFrame(np.random.randint(100, 200, size=(5, 3)), columns=['A', 'B', 'C'])df|    |   A |   B |   C ||---:|----:|----:|----:||  0 | 163 | 163 | 159 ||  1 | 120 | 153 | 181 ||  2 | 130 | 199 | 108 ||  3 | 108 | 188 | 157 ||  4 | 109 | 171 | 119 |from scipy.stats import zscoredf.apply(zscore)|    |         A |         B |         C ||---:|----------:|----------:|----------:||  0 |  1.83447  | -0.708023 |  0.523362 ||  1 | -0.297482 | -1.30804  |  1.3342   ||  2 |  0.198321 |  1.45205  | -1.35632  ||  3 | -0.892446 |  0.792025 |  0.449649 ||  4 | -0.842866 | -0.228007 | -0.950897 |

If not all the columns of your data frame are numeric, then you can apply the Z-score function only to the numeric columns using the select_dtypes function:

# Note that `select_dtypes` returns a data frame. We are selecting only the columnsnumeric_cols = df.select_dtypes(include=[np.number]).columnsdf[numeric_cols].apply(zscore)|    |         A |         B |         C ||---:|----------:|----------:|----------:||  0 |  1.83447  | -0.708023 |  0.523362 ||  1 | -0.297482 | -1.30804  |  1.3342   ||  2 |  0.198321 |  1.45205  | -1.35632  ||  3 | -0.892446 |  0.792025 |  0.449649 ||  4 | -0.842866 | -0.228007 | -0.950897 |


If you want to calculate the zscore for all of the columns, you can just use the following:

df_zscore = (df - df.mean())/df.std()