How to convert a factor to integer\numeric without loss of information? How to convert a factor to integer\numeric without loss of information? r r

How to convert a factor to integer\numeric without loss of information?


See the Warning section of ?factor:

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

The FAQ on R has similar advice.


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f))?

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]), so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values. The speed difference will be most apparent for long vectors with few levels. If the values are mostly unique, there won't be much difference in speed. However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.


Some timings

library(microbenchmark)microbenchmark(  as.numeric(levels(f))[f],  as.numeric(levels(f)[f]),  as.numeric(as.character(f)),  paste0(x),  paste(x),  times = 1e5)## Unit: microseconds##                         expr   min    lq      mean median     uq      max neval##     as.numeric(levels(f))[f] 3.982 5.120  6.088624  5.405  5.974 1981.418 1e+05##     as.numeric(levels(f)[f]) 5.973 7.111  8.352032  7.396  8.250 4256.380 1e+05##  as.numeric(as.character(f)) 6.827 8.249  9.628264  8.534  9.671 1983.694 1e+05##                    paste0(x) 7.964 9.387 11.026351  9.956 10.810 2911.257 1e+05##                     paste(x) 7.965 9.387 11.127308  9.956 11.093 2419.458 1e+05


R has a number of (undocumented) convenience functions for converting factors:

  • as.character.factor
  • as.data.frame.factor
  • as.Date.factor
  • as.list.factor
  • as.vector.factor
  • ...

But annoyingly, there is nothing to handle the factor -> numeric conversion. As an extension of Joshua Ulrich's answer, I would suggest to overcome this omission with the definition of your own idiomatic function:

as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}

that you can store at the beginning of your script, or even better in your .Rprofile file.


The most easiest way would be to use unfactor function from package varhandle which can accept a factor vector or even a dataframe:

unfactor(your_factor_variable)

This example can be a quick start:

x <- rep(c("a", "b", "c"), 20)y <- rep(c(1, 1, 0), 20)class(x)  # -> "character"class(y)  # -> "numeric"x <- factor(x)y <- factor(y)class(x)  # -> "factor"class(y)  # -> "factor"library(varhandle)x <- unfactor(x)y <- unfactor(y)class(x)  # -> "character"class(y)  # -> "numeric"

You can also use it on a dataframe. For example the iris dataset:

sapply(iris, class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species   "numeric"    "numeric"    "numeric"    "numeric"     "factor"
# load the packagelibrary("varhandle")# pass the iris to unfactortmp_iris <- unfactor(iris)# check the classes of the columnssapply(tmp_iris, class)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species   "numeric"    "numeric"    "numeric"    "numeric"  "character"
# check if the last column is correctly convertedtmp_iris$Species
  [1] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"      [6] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"     [11] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"     [16] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"     [21] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"     [26] "setosa"     "setosa"     "setosa"     "setosa"     "setosa"     [31] "setosa"     "setosa"     "setosa"     "setosa"     "setosa" [36] "setosa"     "setosa"     "setosa"     "setosa"     "setosa" [41] "setosa"     "setosa"     "setosa"     "setosa"     "setosa" [46] "setosa"     "setosa"     "setosa"     "setosa"     "setosa" [51] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [56] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [61] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [66] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [71] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [76] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [81] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [86] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [91] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor" [96] "versicolor" "versicolor" "versicolor" "versicolor" "versicolor"[101] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[106] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[111] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[116] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[121] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[126] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[131] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[136] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[141] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"[146] "virginica"  "virginica"  "virginica"  "virginica"  "virginica"