R: convert to factor with order of levels same with case_when
levels are set in lexicographic order by default. If you don't want to specify them, you can set them up so that lexicographic order is correct (Performance1
), or create a levels
vector once, and use it when generating the factor and when setting the levels (Performance2
). I don't know how much effort or tediousness either of these would save you, but here they are. Take a look at my 3rd recommendation for what I think would be the least tedious way.
Performance1 <- function(x) { case_when( is.na(x) ~ NA_character_, x > 80 ~ 'Excellent', x <= 50 ~ 'Fail', TRUE ~ 'Good', ) %>% factor()}Performance2 <- function(x, levels = c("Excellent", "Good", "Fail")){ case_when( is.na(x) ~ NA_character_, x > 80 ~ levels[1], x > 50 ~ levels[2], TRUE ~ levels[3] ) %>% factor(levels)}performance1 <- Performance1(score)levels(performance1)# [1] "Excellent" "Fail" "Good"table(performance1)# performance1# Excellent Fail Good # 15 55 30 performance2 <- Performance2(score)levels(performance2)# [1] "Excellent" "Good" "Fail" table(performance2)# performance2# Excellent Good Fail # 15 30 55
If I could suggest an even less tedious way:
performance <- cut(score, breaks = c(0, 50, 80, 100), labels = c("Fail", "Good", "Excellent"))levels(performance)# [1] "Fail" "Good" "Excellent"table(performance)# performance# Fail Good Excellent # 55 30 15
While my solution replaces your piping with a messy intermediate variable, this works:
library(dplyr, warn.conflicts = FALSE) set.seed(1234) score <- runif(100, min = 0, max = 100) Performance <- function(x) { t <- case_when( is.na(x) ~ NA_character_, x > 80 ~ 'Excellent', x > 50 ~ 'Good', TRUE ~ 'Fail' ) to <- subset(t, !duplicated(t)) factor(t, levels=(to[order(subset(x, !duplicated(t)), decreasing=T)] ))} performance <- Performance(score) levels(performance)
Edited to fix!
My Solution
Finally, I came up with a solution. For those who are interested, here is my solution. I wrote a function fct_case_when
(pretend being a function in forcats
). It is just a wrapper of case_when
with factor output. The order of levels is the same as the argument order.
fct_case_when <- function(...) { args <- as.list(match.call()) levels <- sapply(args[-1], function(f) f[[3]]) # extract RHS of formula levels <- levels[!is.na(levels)] factor(dplyr::case_when(...), levels=levels)}
Now, I can use fct_case_when
in place of case_when
, and the result will be the same as the previous implementation (but less tedious).
Performance <- function(x) { fct_case_when( is.na(x) ~ NA_character_, x > 80 ~ 'Excellent', x > 50 ~ 'Good', TRUE ~ 'Fail' )} performance <- Performance(score) levels(performance) #> [1] "Excellent" "Good" "Fail"table(performance) #> performance#> Excellent Good Fail #> 15 30 55