Standardize data columns in R
I have to assume you meant to say that you wanted a mean of 0 and a standard deviation of 1. If your data is in a dataframe and all the columns are numeric you can simply call the scale
function on the data to do what you want.
dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5))scaled.dat <- scale(dat)# check that we get mean of 0 and sd of 1colMeans(scaled.dat) # faster version of apply(scaled.dat, 2, mean)apply(scaled.dat, 2, sd)
Using built in functions is classy. Like this cat:
Realizing that the question is old and one answer is accepted, I'll provide another answer for reference.
scale
is limited by the fact that it scales all variables. The solution below allows to scale only specific variable names while preserving other variables unchanged (and the variable names could be dynamically generated):
library(dplyr)set.seed(1234)dat <- data.frame(x = rnorm(10, 30, .2), y = runif(10, 3, 5), z = runif(10, 10, 20))datdat2 <- dat %>% mutate_at(c("y", "z"), ~(scale(.) %>% as.vector))dat2
which gives me this:
> dat x y z1 29.75859 3.633225 14.560912 30.05549 3.605387 12.651873 30.21689 3.318092 13.046724 29.53086 3.079992 15.073075 30.08582 3.437599 11.810966 30.10121 4.621197 17.596717 29.88505 4.051395 12.012488 29.89067 4.829316 12.588109 29.88711 4.662690 19.9215010 29.82199 3.091541 18.07352
and
> dat2 <- dat %>% mutate_at(c("y", "z"), ~(scale(.) %>% as.vector))> dat2 x y z1 29.75859 -0.3004815 -0.060160292 30.05549 -0.3423437 -0.725296043 30.21689 -0.7743696 -0.587723614 29.53086 -1.1324181 0.118280395 30.08582 -0.5946582 -1.018277526 30.10121 1.1852038 0.997546667 29.88505 0.3283513 -0.948066078 29.89067 1.4981677 -0.747513789 29.88711 1.2475998 1.8075347010 29.82199 -1.1150515 1.16367556
EDIT 1 (2016): Addressed Julian's comment: the output of scale
is Nx1 matrix so ideally we should add an as.vector
to convert the matrix type back into a vector type. Thanks Julian!
EDIT 2 (2019): Quoting Duccio A.'s comment: For the latest dplyr (version 0.8) you need to change dplyr::funcs with list, like dat %>% mutate_each_(list(~scale(.) %>% as.vector), vars=c("y","z"))
EDIT 3 (2020): Thanks to @mj_whales: the old solution is deprecated and now we need to use mutate_at
.
This is 3 years old. Still, I feel I have to add the following:
The most common normalization is the z-transformation, where you subtract the mean and divide by the standard deviation of your variable. The result will have mean=0 and sd=1.
For that, you don't need any package.
zVar <- (myVar - mean(myVar)) / sd(myVar)
That's it.