Aggregate / summarize multiple variables per group (e.g. sum, mean) Aggregate / summarize multiple variables per group (e.g. sum, mean) r r

Aggregate / summarize multiple variables per group (e.g. sum, mean)


Yes, in your formula, you can cbind the numeric variables to be aggregated:

aggregate(cbind(x1, x2) ~ year + month, data = df1, sum, na.rm = TRUE)   year month         x1          x21  2000     1   7.862002   -7.4692982  2001     1 276.758209  474.3842523  2000     2  13.122369 -128.122613...23 2000    12  63.436507  449.79445424 2001    12 999.472226  922.726589

See ?aggregate, the formula argument and the examples.


Using the data.table package, which is fast (useful for larger datasets)

https://github.com/Rdatatable/data.table/wiki

library(data.table)df2 <- setDT(df1)[, lapply(.SD, sum), by=.(year, month), .SDcols=c("x1","x2")]setDF(df2) # convert back to dataframe

Using the plyr package

require(plyr)df2 <- ddply(df1, c("year", "month"), function(x) colSums(x[c("x1", "x2")]))

Using summarize() from the Hmisc package (column headings are messy in my example though)

# need to detach plyr because plyr and Hmisc both have a summarize()detach(package:plyr)require(Hmisc)df2 <- with(df1, summarize( cbind(x1, x2), by=llist(year, month), FUN=colSums))


With the dplyr package, you can use summarise_all, summarise_at or summarise_if functions to aggregate multiple variables simultaneously. For the example dataset you can do this as follows:

library(dplyr)# summarising all non-grouping variablesdf2 <- df1 %>% group_by(year, month) %>% summarise_all(sum)# summarising a specific set of non-grouping variablesdf2 <- df1 %>% group_by(year, month) %>% summarise_at(vars(x1, x2), sum)df2 <- df1 %>% group_by(year, month) %>% summarise_at(vars(-date), sum)# summarising a specific set of non-grouping variables using select_helpers# see ?select_helpers for more optionsdf2 <- df1 %>% group_by(year, month) %>% summarise_at(vars(starts_with('x')), sum)df2 <- df1 %>% group_by(year, month) %>% summarise_at(vars(matches('.*[0-9]')), sum)# summarising a specific set of non-grouping variables based on condition (class)df2 <- df1 %>% group_by(year, month) %>% summarise_if(is.numeric, sum)

The result of the latter two options:

    year month        x1         x2   <dbl> <dbl>     <dbl>      <dbl>1   2000     1 -73.58134  -92.785952   2000     2 -57.81334 -152.369833   2000     3 122.68758  153.552434   2000     4 450.24980  285.563745   2000     5 678.37867  384.428886   2000     6 792.68696  530.286947   2000     7 908.58795  452.312228   2000     8 710.69928  719.352259   2000     9 725.06079  914.9368710  2000    10 770.60304  863.39337# ... with 14 more rows

Note: summarise_each is deprecated in favor of summarise_all, summarise_at and summarise_if.


As mentioned in my comment above, you can also use the recast function from the reshape2-package:

library(reshape2)recast(df1, year + month ~ variable, sum, id.var = c("date", "year", "month"))

which will give you the same result.