How to parametrize function calls in dplyr 0.7?
dplyr
will have a specialized group_by function group_by_at
to deal with multiple grouping variables. It would be much easier to use the new member of the _at
family:
# using the pre-release 0.6.0cols <- c("am","gear")mtcars %>% group_by_at(.vars = cols) %>% summarise(mean_cyl=mean(cyl))# Source: local data frame [4 x 3]# Groups: am [?]# # am gear mean_cyl# <dbl> <dbl> <dbl># 1 0 3 7.466667# 2 0 4 5.000000# 3 1 4 4.500000# 4 1 5 6.000000
The .vars
argument accepts both character/numeric vector or column names generated by vars
:
.vars
A list of columns generated by vars(), or a character vector ofcolumn names, or a numeric vector of column positions.
Here's the quick and dirty reference I wrote for myself.
# install.packages("rlang")library(tidyverse)dat <- data.frame(cat = sample(LETTERS[1:2], 50, replace = TRUE), cat2 = sample(LETTERS[3:4], 50, replace = TRUE), value = rnorm(50))
Representing column names with strings
Convert strings to symbol objects using rlang::sym
and rlang::syms
.
summ_var <- "value"group_vars <- c("cat", "cat2")summ_sym <- rlang::sym(summ_var) # capture a single symbolgroup_syms <- rlang::syms(group_vars) # creates list of symbolsdat %>% group_by(!!!group_syms) %>% # splice list of symbols into a function call summarize(summ = sum(!!summ_sym)) # slice single symbol into call
If you use !!
or !!!
outside of dplyr
functions you will get an error.
The usage of rlang::sym
and rlang::syms
is identical inside functions.
summarize_by <- function(df, summ_var, group_vars) { summ_sym <- rlang::sym(summ_var) group_syms <- rlang::syms(group_vars) df %>% group_by(!!!group_syms) %>% summarize(summ = sum(!!summ_sym))}
We can then call summarize_by
with string arguments.
summarize_by(dat, "value", c("cat", "cat2"))
Using non-standard evaluation for column/variable names
summ_quo <- quo(value) # capture a single variable for NSEgroup_quos <- quos(cat, cat2) # capture list of variables for NSEdat %>% group_by(!!!group_quos) %>% # use !!! with both quos and rlang::syms summarize(summ = sum(!!summ_quo)) # use !! both quo and rlang::sym
Inside functions use enquo
rather than quo
. quos
is okay though!?
summarize_by <- function(df, summ_var, ...) { summ_quo <- enquo(summ_var) # can only capture a single value! group_quos <- quos(...) # captures multiple values, also inside functions!? df %>% group_by(!!!group_quos) %>% summarize(summ = sum(!!summ_quo))}
And then our function call is
summarize_by(dat, value, cat, cat2)
If you want to group by possibly more than one column, you can use quos
grouping_vars <- quos(am, gear)mtcars %>% group_by(!!!grouping_vars) %>% summarise(mean_cyl=mean(cyl))# am gear mean_cyl# <dbl> <dbl> <dbl># 1 0 3 7.466667# 2 0 4 5.000000# 3 1 4 4.500000# 4 1 5 6.000000
Right now, it doesn't seem like there's a great way to turn strings into quos. Here's one way that does work though
cols <- c("am","gear")grouping_vars <- rlang::parse_quosures(paste(cols, collapse=";"))mtcars %>% group_by(!!!grouping_vars) %>% summarise(mean_cyl=mean(cyl))# am gear mean_cyl# <dbl> <dbl> <dbl># 1 0 3 7.466667# 2 0 4 5.000000# 3 1 4 4.500000# 4 1 5 6.000000