How to force R to use a specified factor level as reference in a regression? How to force R to use a specified factor level as reference in a regression? r r

How to force R to use a specified factor level as reference in a regression?


See the relevel() function. Here is an example:

set.seed(123)x <- rnorm(100)DF <- data.frame(x = x,                 y = 4 + (1.5*x) + rnorm(100, sd = 2),                 b = gl(5, 20))head(DF)str(DF)m1 <- lm(y ~ x + b, data = DF)summary(m1)

Now alter the factor b in DF by use of the relevel() function:

DF <- within(DF, b <- relevel(b, ref = 3))m2 <- lm(y ~ x + b, data = DF)summary(m2)

The models have estimated different reference levels.

> coef(m1)(Intercept)           x          b2          b3          b4          b5   3.2903239   1.4358520   0.6296896   0.3698343   1.0357633   0.4666219 > coef(m2)(Intercept)           x          b1          b2          b4          b5  3.66015826  1.43585196 -0.36983433  0.25985529  0.66592898  0.09678759


I know this is an old question, but I had a similar issue and found that:

lm(x ~ y + relevel(b, ref = "3")) 

does exactly what you asked.


Others have mentioned the relevel command which is the best solution if you want to change the base level for all analyses on your data (or are willing to live with changing the data).

If you don't want to change the data (this is a one time change, but in the future you want the default behavior again), then you can use a combination of the C (note uppercase) function to set contrasts and the contr.treatments function with the base argument for choosing which level you want to be the baseline.

For example:

lm( Sepal.Width ~ C(Species,contr.treatment(3, base=2)), data=iris )