dplyr tidyr – How to generate case_when with dynamic conditons? dplyr tidyr – How to generate case_when with dynamic conditons? r r

dplyr tidyr – How to generate case_when with dynamic conditons?


Here is a custom case_when function that you can call with purrr::reduce and a vector of strings parts of your variable names (in the example c("low", "high"):

library(dplyr)library(purrr)my_case_when <- function(df, x) {    mutate(df,         "ans_{x}" := case_when(           !is.na(!! sym(paste0(x, "_TOT"))) ~ !! sym(paste0(x, "_TOT")),           !is.na(!! sym(paste0(x, "_A"))) ~ !! sym(paste0(x, "_A")),           !is.na(!! sym(paste0(x, "_B"))) ~ !! sym(paste0(x, "_B"))           )  )}test_df %>%   reduce(c("low", "high"), my_case_when, .init = .)#> # A tibble: 3 x 8#>   low_A low_TOT low_B high_A high_TOT high_B ans_low ans_high#>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>  <dbl>   <dbl>    <dbl>#> 1     5      NA    20     NA       NA     60       5       60#> 2    15      10    25     NA       40     20      10       40#> 3    NA      NA    30     10       NA     NA      30       10

Created on 2021-07-22 by the reprex package (v0.3.0)

I also have a package on Github {dplyover} which is made for this kind of cases. For your example with more than two variables I would use dplyover::over together with a special syntax to evaluate strings as variable names. We can further use dplyover::cut_names("_TOT") to extract the string parts of the variable names that come before or after "_TOT" (in the example this is "low" and "high").

We can either use case_when:

library(dplyr)library(dplyover) # https://github.com/TimTeaFan/dplyovertest_df %>%   mutate(over(cut_names("_TOT"),              list(ans = ~ case_when(                  !is.na(.("{.x}_TOT")) ~ .("{.x}_TOT"),                  !is.na(.("{.x}_A")) ~ .("{.x}_A"),                  !is.na(.("{.x}_B")) ~ .("{.x}_B")                  )),              .names = "{fn}_{x}")         )#> # A tibble: 3 x 8#>   low_A low_TOT low_B high_A high_TOT high_B ans_low ans_high#>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>  <dbl>   <dbl>    <dbl>#> 1     5      NA    20     NA       NA     60       5       60#> 2    15      10    25     NA       40     20      10       40#> 3    NA      NA    30     10       NA     NA      30       10

Or somewhat easier coalesce:

test_df %>%   mutate(over(cut_names("_TOT"),              list(ans = ~ coalesce(.("{.x}_TOT"),                                    .("{.x}_A"),                                    .("{.x}_B"))),              .names = "{fn}_{x}")  )#> # A tibble: 3 x 8#>   low_A low_TOT low_B high_A high_TOT high_B ans_low ans_high#>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>  <dbl>   <dbl>    <dbl>#> 1     5      NA    20     NA       NA     60       5       60#> 2    15      10    25     NA       40     20      10       40#> 3    NA      NA    30     10       NA     NA      30       10

Created on 2021-07-22 by the reprex package (v0.3.0)


At the risk of not answering the question, I think the easiest way to approach this is to just reshape and use coalesce(). Your data structure requires two pivots either way (I think) but this requires no careful thinking about what prefixes are present.

library(tidyverse)test_df <- tibble(  low_A = c(5, 15, NA),  low_TOT = c(NA, 10, NA),  low_B = c(20, 25, 30),  high_A = c(NA, NA, 10),  high_TOT = c(NA, 40, NA),  high_B = c(60, 20, NA))test_df %>%  rowid_to_column() %>%  pivot_longer(cols = -rowid, names_to = c("prefix", "suffix"), names_sep = "_") %>%  pivot_wider(names_from = suffix, values_from = value) %>%  mutate(ans = coalesce(TOT, A, B)) %>%  pivot_longer(cols = c(-rowid, -prefix), names_to = "suffix") %>%  pivot_wider(names_from = c(prefix, suffix), names_sep = "_", values_from = value)#> # A tibble: 3 x 9#>   rowid low_A low_TOT low_B low_ans high_A high_TOT high_B high_ans#>   <int> <dbl>   <dbl> <dbl>   <dbl>  <dbl>    <dbl>  <dbl>    <dbl>#> 1     1     5      NA    20       5     NA       NA     60       60#> 2     2    15      10    25      10     NA       40     20       40#> 3     3    NA      NA    30      30     10       NA     NA       10

Note also that case_when has no tidy evaluation, and so just not using mutate simplifies your some_func a lot. You already got an answer using !!sym inside mutate, so here is a version that illustrates a simpler way. I prefer not to use tidyeval unless necessary because I want to use a mutate chain, and here it's not really needed.

some_func <- function(df, prefix) {  ans <- str_c(prefix, "_ans")  TOT <- df[[str_c(prefix, "_TOT")]]  A <- df[[str_c(prefix, "_A")]]  B <- df[[str_c(prefix, "_B")]]    df[[ans]] <- case_when(    !is.na(TOT) ~ TOT,    !is.na(A) ~ A,    !is.na(B) ~ B  )  df}reduce(c("low", "high"), some_func, .init = test_df)#> # A tibble: 3 x 8#>   low_A low_TOT low_B high_A high_TOT high_B low_ans high_ans#>   <dbl>   <dbl> <dbl>  <dbl>    <dbl>  <dbl>   <dbl>    <dbl>#> 1     5      NA    20     NA       NA     60       5       60#> 2    15      10    25     NA       40     20      10       40#> 3    NA      NA    30     10       NA     NA      30       10


Updated SolutionI think this solution solely based on base R may help you.

fn <- function(data) {    do.call(cbind, lapply(unique(gsub("([[:alpha:]]+)_.*", "\\1", names(test_df))), function(x) {    tmp <- test_df[paste0(x, c("_TOT", "_A", "_B"))]    tmp[[paste(x, "ans", sep = "_")]] <- Reduce(function(a, b) {      i <- which(is.na(a))      a[i] <- b[i]      a    }, tmp)    tmp  }))}fn(test_df)fn(test_df)   high_TOT high_A high_B high_ans low_TOT low_A low_B low_ans1       NA     NA     60       60      NA     5    20       52       40     NA     20       40      10    15    25      103       NA     10     NA       10      NA    NA    30      30