Standardize data columns in R Standardize data columns in R r r

Standardize data columns in R


I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale function on the data to do what you want.

dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))scaled.dat <- scale(dat)# check that we get mean of 0 and sd of 1colMeans(scaled.dat)  # faster version of apply(scaled.dat, 2, mean)apply(scaled.dat, 2, sd)

Using built in functions is classy. Like this cat:

enter image description here


Realizing that the question is old and one answer is accepted, I'll provide another answer for reference.

scale is limited by the fact that it scales all variables. The solution below allows to scale only specific variable names while preserving other variables unchanged (and the variable names could be dynamically generated):

library(dplyr)set.seed(1234)dat <- data.frame(x = rnorm(10, 30, .2),                   y = runif(10, 3, 5),                  z = runif(10, 10, 20))datdat2 <- dat %>% mutate_at(c("y", "z"), ~(scale(.) %>% as.vector))dat2

which gives me this:

> dat          x        y        z1  29.75859 3.633225 14.560912  30.05549 3.605387 12.651873  30.21689 3.318092 13.046724  29.53086 3.079992 15.073075  30.08582 3.437599 11.810966  30.10121 4.621197 17.596717  29.88505 4.051395 12.012488  29.89067 4.829316 12.588109  29.88711 4.662690 19.9215010 29.82199 3.091541 18.07352

and

> dat2 <- dat %>% mutate_at(c("y", "z"), ~(scale(.) %>% as.vector))> dat2          x          y           z1  29.75859 -0.3004815 -0.060160292  30.05549 -0.3423437 -0.725296043  30.21689 -0.7743696 -0.587723614  29.53086 -1.1324181  0.118280395  30.08582 -0.5946582 -1.018277526  30.10121  1.1852038  0.997546667  29.88505  0.3283513 -0.948066078  29.89067  1.4981677 -0.747513789  29.88711  1.2475998  1.8075347010 29.82199 -1.1150515  1.16367556

EDIT 1 (2016): Addressed Julian's comment: the output of scale is Nx1 matrix so ideally we should add an as.vector to convert the matrix type back into a vector type. Thanks Julian!

EDIT 2 (2019): Quoting Duccio A.'s comment: For the latest dplyr (version 0.8) you need to change dplyr::funcs with list, like dat %>% mutate_each_(list(~scale(.) %>% as.vector), vars=c("y","z"))

EDIT 3 (2020): Thanks to @mj_whales: the old solution is deprecated and now we need to use mutate_at.


This is 3 years old. Still, I feel I have to add the following:

The most common normalization is the z-transformation, where you subtract the mean and divide by the standard deviation of your variable. The result will have mean=0 and sd=1.

For that, you don't need any package.

zVar <- (myVar - mean(myVar)) / sd(myVar)

That's it.