R: convert to factor with order of levels same with case_when R: convert to factor with order of levels same with case_when r r

R: convert to factor with order of levels same with case_when


levels are set in lexicographic order by default. If you don't want to specify them, you can set them up so that lexicographic order is correct (Performance1), or create a levels vector once, and use it when generating the factor and when setting the levels (Performance2). I don't know how much effort or tediousness either of these would save you, but here they are. Take a look at my 3rd recommendation for what I think would be the least tedious way.

Performance1 <- function(x) {                         case_when(    is.na(x) ~ NA_character_,                              x > 80 ~ 'Excellent',      x <= 50 ~ 'Fail',    TRUE ~ 'Good',  ) %>% factor()}Performance2 <- function(x, levels = c("Excellent", "Good", "Fail")){  case_when(    is.na(x) ~ NA_character_,    x > 80 ~ levels[1],    x > 50 ~ levels[2],    TRUE ~ levels[3]  ) %>% factor(levels)}performance1 <- Performance1(score)levels(performance1)# [1] "Excellent" "Fail"     "Good"table(performance1)# performance1# Excellent      Fail      Good #        15        55        30 performance2 <- Performance2(score)levels(performance2)# [1] "Excellent" "Good"      "Fail"  table(performance2)# performance2# Excellent      Good      Fail #        15        30        55 

If I could suggest an even less tedious way:

performance <- cut(score, breaks = c(0, 50, 80, 100),                    labels = c("Fail", "Good", "Excellent"))levels(performance)# [1] "Fail"      "Good"      "Excellent"table(performance)# performance#      Fail      Good Excellent #        55        30        15


While my solution replaces your piping with a messy intermediate variable, this works:

    library(dplyr, warn.conflicts = FALSE)             set.seed(1234)                                     score <- runif(100, min = 0, max = 100)     Performance <- function(x) {                         t <- case_when(                                             is.na(x) ~ NA_character_,                              x > 80   ~ 'Excellent',                                x > 50   ~ 'Good',                                     TRUE     ~ 'Fail'                                    )   to <- subset(t, !duplicated(t))  factor(t, levels=(to[order(subset(x, !duplicated(t)), decreasing=T)] ))}                                                  performance <- Performance(score)                levels(performance)  

Edited to fix!


My Solution

Finally, I came up with a solution. For those who are interested, here is my solution. I wrote a function fct_case_when (pretend being a function in forcats). It is just a wrapper of case_when with factor output. The order of levels is the same as the argument order.


fct_case_when <- function(...) {  args <- as.list(match.call())  levels <- sapply(args[-1], function(f) f[[3]])  # extract RHS of formula  levels <- levels[!is.na(levels)]  factor(dplyr::case_when(...), levels=levels)}

Now, I can use fct_case_when in place of case_when, and the result will be the same as the previous implementation (but less tedious).


Performance <- function(x) {                         fct_case_when(                                             is.na(x) ~ NA_character_,                              x > 80   ~ 'Excellent',                                x > 50   ~ 'Good',                                     TRUE     ~ 'Fail'                                    )}      performance <- Performance(score)                  levels(performance)                       #> [1] "Excellent" "Good"      "Fail"table(performance)                #> performance#> Excellent      Good      Fail #>        15        30        55