Variables as default arguments of a function, using dplyr
Here are two approaches in data.table
, however I don't believe that either of them will work in dplyr
at the present.
In data.table
, whatever is inside the j-expression
(aka the 2nd argument of [.data.table
) gets parsed by the data.table
package first, and not by regular R parser. In a way you can think of it as a separate language parser living inside the regular language parser that is R. What this parser does, is it looks for what variables you have used that are actually columns of the data.table
you're operating on, and whatever it finds it puts it in the environment of the j-expression
.
What this means, is that you have to let this parser know somehow that gear
will be used, or it simply will not be part of the environment. Following are two ideas for accomplishing that.
The "simple" way to do it, is to actually use the column name in the j-expression
where you call lag2
(in addition to some monkeying within lag2
):
dt = as.data.table(mtcars)lag2 = function(x) lag(x, order_by = get('gear', sys.frame(4)))dt[, newvar := {gear; lag2(cyl)}]# ordt[, newvar := {.SD; lag2(cyl)}]
This solution has 2 undesirable properties imo - first, I'm not sure how fragile that sys.frame(4)
is - you put this thing in a function or a package and I don't know what will happen. You can probably work around it and figure out the right frame, but it's kind of a pain. Second - you either have to mention the particular variable you're interested in, anywhere in the expression, or dump all of them in the environment by using .SD
, again anywhere.
A second option that I like more, is to take advantage of the fact that the data.table
parser evaluates eval
expressions in place before the variable lookup, so if you use a variable inside some expression that you eval
, that would work:
lag3 = quote(function(x) lag(x, order_by = gear))dt[, newvar := eval(lag3)(cyl)]
This doesn't suffer from the issues of the other solution, with the obvious disadvantage of having to type an extra eval
.
This solution is coming close:
Consider a slightly easier toy example:
mtcars %>% mutate(carb2 = lag(carb, order_by = gear))
We still use lag
and it's order_by
argument, but don't do any further computation with it. Instead of sticking to the SE mutate
, we switch to NSE mutate_
and make lag2
build a function call as a character vector.
lag2 <- function(x, n = 1, order_by = gear) { x <- deparse(substitute(x)) order_by <- deparse(substitute(order_by)) paste0('dplyr::lag(x = ', x, ', n = ', n, ', order_by = ', order_by, ')')}mtcars %>% mutate_(carb2 = lag2(carb))
This gives us an identical result to the above.
The orginial toy example can be achieved with:
mtcars %>% mutate_(cyl_change = paste('cyl !=', lag2(cyl)))
Downsides:
- We have to use the SE
mutate_
. - For extended usage as in the original example we need to also use
paste
. - This is not particularly safe, i.e. it is not immediately clear where
gear
should come from. Assigning values togear
orcarb
in the global environment seems to be ok, but my guess is that unexpected bugs could occur in some cases. Using a formula instead of a character vector would be safer, but this requires the correct environment to be assigned for it to work, and that is still a big question mark for me.
This isn't elegant, as it requires an extra argument. But, by passing the entire data frame we get nearly the required behavior
lag2 <- function(x, df, n = 1L, order_by = df[['gear']], ...) { lag(x, n = n, order_by = order_by, ...)}hack <- mtcars %>% mutate(cyl_change = cyl != lag2(cyl, .))ans <- mtcars %>% mutate(cyl_change = cyl != lag(cyl, order_by = gear))all.equal(hack, ans)# [1] TRUE
- One should be able to call lag2 without having to provide gear.
Yes, but you need to pass .
.
- One should be able to use lag2 on datasets that are not called mtcars (but do have gear as one it's variables).
This works.
- Preferably gear would be a default argument to the function, so it can still be changed if required, but this is not crucial.
This also works:
hack_nondefault <- mtcars %>% mutate(cyl_change = cyl != lag2(cyl, order_by = cyl))ans_nondefault <- mtcars %>% mutate(cyl_change = cyl != lag(cyl, order_by = cyl))all.equal(hack_nondefault, ans_nondefault)# [1] TRUE
Note that if you manually give order_by
, specifying df
with the .
is not longer necessary and usage becomes identical to the original lag
(which is very nice).
Addendum
It seems hard to avoid using SE mutate_
as in the answer posed by the OP, to do some simple hackery like in my answer here, or to do something more advanced involving reverse-engineering lazyeval::lazy_dots
.
Evidence:
1) dplyr::lag
itself doesn't use any NSE wizardry
2) mutate
simply calls mutate_(.data, .dots = lazyeval::lazy_dots(...))