summarise does not return warning from max when no non-NA values
Below is a partial diagnosis; proves that somehow dplyr is messing up the reference to function name max()
. Also, dplyr generally uses SE (Standard Evaluation) on its args: lazyeval::lazydots(..., .follow_symbols=F))
, so maybe that affects the promise, although I can't see how:
A) group_by()
is not the culprit. df2 %>% group_by(a) %>% summarise(length(na.omit(b)))
does prove that group b is passing a vector with one NA element to max()
B) When we reference max by its qualified name base::max
, we do see the warning:
> df2 %>% group_by(a) %>% summarise(x = base::max(b, na.rm = TRUE)) a x1 a 12 b -InfWarning message:In base::max(NA_real_, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
And I checked that there is no dplyr:::max()
, so it's not namespace shadowing.
B2) Similarly, do.call(max, ...)
gives the warning as expected.
> df2 %>% group_by(a) %>% summarise(x = do.call(max, list(b, na.rm = TRUE))) a x1 a 12 b -InfWarning message:In .Primitive("max")(NA_real_, na.rm = TRUE) : no non-missing arguments to max; returning -Inf
C) Also, note dplyr generally uses SE (Standard Evaluation) on its args: lazyeval::lazydots(..., .follow_symbols=F))
, but I can't see how that would cause this.
C2) I tried to recreate the internal result of the group_by with:
grouped_df(as.numeric(NA), list()), na.rm=T)
and to recreate the promise with something like:
p <- lazyeval::lazy_dots( max, list( grouped_df(as.numeric(NA), list()), na.rm=T ) , .follow_symbols=F)
I couldn't manage to formulate that with .follow_symbols=T
I know almost nothing about Standard Evaluation, so sleuth on at http://adv-r.had.co.nz/Expressions.html#metaprogramming
Versions used: dplyr 0.5.0 ; lazyeval 0.1.10 ; although lazyeval 0.2.0 is Hadley's latest
For max()
, a hybrid version is available that works much faster for a grouped data frame, because the entire evaluation can be carried out in C++ without R callback for each group. In dplyr 0.5.0, the hybrid version is triggered when all of the following conditions are met:
- The first argument refers to a variable that exists in the data frame
- The second argument is a
logical
constant
See the hybrid vignette for more detail.
The hybrid version of max()
differs in certain aspects from the R implementation:
- No warnings are raised for an empty vector, silently returning
-Inf
- I think this was always the case; we might as well add a warning here, but I suspect that other users won't be happy about this
- An all-
NA
vector will returnNA
even withna.rm = TRUE
- This is certainly a bug, I filed an issue
In your example, c(NA, NA)
is a vector of logical
, so dplyr falls back to "regular" evaluation with one R callback for each group. If you need the original behavior, simply use a wrapper or an alias; the hybrid evaluator will fall back to regular evaluation:
max_ <- maxdata_frame(a = NA_real_) %>% summarise(a = max_(a, na.rm = TRUE))## # A tibble: 1 × 1## a## <dbl>## 1 -Inf## Warning message:## In max_(a, na.rm = TRUE) : no non-missing arguments to max; returning -Inf