summarise does not return warning from max when no non-NA values summarise does not return warning from max when no non-NA values r r

summarise does not return warning from max when no non-NA values


Below is a partial diagnosis; proves that somehow dplyr is messing up the reference to function name max(). Also, dplyr generally uses SE (Standard Evaluation) on its args: lazyeval::lazydots(..., .follow_symbols=F)), so maybe that affects the promise, although I can't see how:

A) group_by() is not the culprit. df2 %>% group_by(a) %>% summarise(length(na.omit(b)))does prove that group b is passing a vector with one NA element to max()

B) When we reference max by its qualified name base::max, we do see the warning:

> df2 %>% group_by(a) %>% summarise(x = base::max(b, na.rm = TRUE))       a     x1      a     12      b  -InfWarning message:In base::max(NA_real_, na.rm = TRUE) :  no non-missing arguments to max; returning -Inf

And I checked that there is no dplyr:::max(), so it's not namespace shadowing.

B2) Similarly, do.call(max, ...) gives the warning as expected.

> df2 %>% group_by(a) %>% summarise(x = do.call(max, list(b, na.rm = TRUE)))       a     x1      a     12      b  -InfWarning message:In .Primitive("max")(NA_real_, na.rm = TRUE) :  no non-missing arguments to max; returning -Inf

C) Also, note dplyr generally uses SE (Standard Evaluation) on its args: lazyeval::lazydots(..., .follow_symbols=F)), but I can't see how that would cause this.

C2) I tried to recreate the internal result of the group_by with:

grouped_df(as.numeric(NA), list()), na.rm=T)

and to recreate the promise with something like:

p <- lazyeval::lazy_dots( max, list( grouped_df(as.numeric(NA), list()), na.rm=T )  , .follow_symbols=F)

I couldn't manage to formulate that with .follow_symbols=T

I know almost nothing about Standard Evaluation, so sleuth on at http://adv-r.had.co.nz/Expressions.html#metaprogramming

Versions used: dplyr 0.5.0 ; lazyeval 0.1.10 ; although lazyeval 0.2.0 is Hadley's latest


For max(), a hybrid version is available that works much faster for a grouped data frame, because the entire evaluation can be carried out in C++ without R callback for each group. In dplyr 0.5.0, the hybrid version is triggered when all of the following conditions are met:

  • The first argument refers to a variable that exists in the data frame
  • The second argument is a logical constant

See the hybrid vignette for more detail.

The hybrid version of max() differs in certain aspects from the R implementation:

  • No warnings are raised for an empty vector, silently returning -Inf
  • An all-NA vector will return NA even with na.rm = TRUE

In your example, c(NA, NA) is a vector of logical, so dplyr falls back to "regular" evaluation with one R callback for each group. If you need the original behavior, simply use a wrapper or an alias; the hybrid evaluator will fall back to regular evaluation:

max_ <- maxdata_frame(a = NA_real_) %>% summarise(a = max_(a, na.rm = TRUE))## # A tibble: 1 × 1##       a##   <dbl>## 1  -Inf## Warning message:## In max_(a, na.rm = TRUE) : no non-missing arguments to max; returning -Inf