dplyr tidyr – How to generate case_when with dynamic conditons?
Here is a custom case_when
function that you can call with purrr::reduce
and a vector of strings parts of your variable names (in the example c("low", "high")
:
library(dplyr)library(purrr)my_case_when <- function(df, x) { mutate(df, "ans_{x}" := case_when( !is.na(!! sym(paste0(x, "_TOT"))) ~ !! sym(paste0(x, "_TOT")), !is.na(!! sym(paste0(x, "_A"))) ~ !! sym(paste0(x, "_A")), !is.na(!! sym(paste0(x, "_B"))) ~ !! sym(paste0(x, "_B")) ) )}test_df %>% reduce(c("low", "high"), my_case_when, .init = .)#> # A tibble: 3 x 8#> low_A low_TOT low_B high_A high_TOT high_B ans_low ans_high#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 5 NA 20 NA NA 60 5 60#> 2 15 10 25 NA 40 20 10 40#> 3 NA NA 30 10 NA NA 30 10
Created on 2021-07-22 by the reprex package (v0.3.0)
I also have a package on Github {dplyover} which is made for this kind of cases. For your example with more than two variables I would use dplyover::over
together with a special syntax to evaluate strings as variable names. We can further use dplyover::cut_names("_TOT")
to extract the string parts of the variable names that come before or after "_TOT"
(in the example this is "low"
and "high"
).
We can either use case_when
:
library(dplyr)library(dplyover) # https://github.com/TimTeaFan/dplyovertest_df %>% mutate(over(cut_names("_TOT"), list(ans = ~ case_when( !is.na(.("{.x}_TOT")) ~ .("{.x}_TOT"), !is.na(.("{.x}_A")) ~ .("{.x}_A"), !is.na(.("{.x}_B")) ~ .("{.x}_B") )), .names = "{fn}_{x}") )#> # A tibble: 3 x 8#> low_A low_TOT low_B high_A high_TOT high_B ans_low ans_high#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 5 NA 20 NA NA 60 5 60#> 2 15 10 25 NA 40 20 10 40#> 3 NA NA 30 10 NA NA 30 10
Or somewhat easier coalesce
:
test_df %>% mutate(over(cut_names("_TOT"), list(ans = ~ coalesce(.("{.x}_TOT"), .("{.x}_A"), .("{.x}_B"))), .names = "{fn}_{x}") )#> # A tibble: 3 x 8#> low_A low_TOT low_B high_A high_TOT high_B ans_low ans_high#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 5 NA 20 NA NA 60 5 60#> 2 15 10 25 NA 40 20 10 40#> 3 NA NA 30 10 NA NA 30 10
Created on 2021-07-22 by the reprex package (v0.3.0)
At the risk of not answering the question, I think the easiest way to approach this is to just reshape and use coalesce()
. Your data structure requires two pivots either way (I think) but this requires no careful thinking about what prefixes are present.
library(tidyverse)test_df <- tibble( low_A = c(5, 15, NA), low_TOT = c(NA, 10, NA), low_B = c(20, 25, 30), high_A = c(NA, NA, 10), high_TOT = c(NA, 40, NA), high_B = c(60, 20, NA))test_df %>% rowid_to_column() %>% pivot_longer(cols = -rowid, names_to = c("prefix", "suffix"), names_sep = "_") %>% pivot_wider(names_from = suffix, values_from = value) %>% mutate(ans = coalesce(TOT, A, B)) %>% pivot_longer(cols = c(-rowid, -prefix), names_to = "suffix") %>% pivot_wider(names_from = c(prefix, suffix), names_sep = "_", values_from = value)#> # A tibble: 3 x 9#> rowid low_A low_TOT low_B low_ans high_A high_TOT high_B high_ans#> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 1 5 NA 20 5 NA NA 60 60#> 2 2 15 10 25 10 NA 40 20 40#> 3 3 NA NA 30 30 10 NA NA 10
Note also that case_when
has no tidy evaluation, and so just not using mutate
simplifies your some_func
a lot. You already got an answer using !!sym
inside mutate, so here is a version that illustrates a simpler way. I prefer not to use tidyeval unless necessary because I want to use a mutate
chain, and here it's not really needed.
some_func <- function(df, prefix) { ans <- str_c(prefix, "_ans") TOT <- df[[str_c(prefix, "_TOT")]] A <- df[[str_c(prefix, "_A")]] B <- df[[str_c(prefix, "_B")]] df[[ans]] <- case_when( !is.na(TOT) ~ TOT, !is.na(A) ~ A, !is.na(B) ~ B ) df}reduce(c("low", "high"), some_func, .init = test_df)#> # A tibble: 3 x 8#> low_A low_TOT low_B high_A high_TOT high_B low_ans high_ans#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 5 NA 20 NA NA 60 5 60#> 2 15 10 25 NA 40 20 10 40#> 3 NA NA 30 10 NA NA 30 10
Updated SolutionI think this solution solely based on base R may help you.
fn <- function(data) { do.call(cbind, lapply(unique(gsub("([[:alpha:]]+)_.*", "\\1", names(test_df))), function(x) { tmp <- test_df[paste0(x, c("_TOT", "_A", "_B"))] tmp[[paste(x, "ans", sep = "_")]] <- Reduce(function(a, b) { i <- which(is.na(a)) a[i] <- b[i] a }, tmp) tmp }))}fn(test_df)fn(test_df) high_TOT high_A high_B high_ans low_TOT low_A low_B low_ans1 NA NA 60 60 NA 5 20 52 40 NA 20 40 10 15 25 103 NA 10 NA 10 NA NA 30 30