Compute rolling sum by id variables, with missing timepoints

Here are a few solutions:

1) zoo Using ave, for each group create a monthly series, m, by merging the original series, z, with a grid, g. Then calculate the rolling sum and retain only the original time points:

library(zoo)f <- function(i) {     z <- with(df[i, ], zoo(count, t))    g <- zoo(, seq(start(z), end(z), by = "month"))    m <- merge(z, g)    window(rollapplyr(m, 4, sum, na.rm = TRUE, partial = TRUE), time(z))}df$desired <- ave(1:nrow(df), df$id, df$class, FUN = f)

which gives:

> df  id class          t count desired1  1     A 2010-01-15     1       12  1     A 2010-02-15     2       33  1     B 2010-04-15     3       34  1     B 2010-09-15     4       45  2     A 2010-01-15     5       56  2     B 2010-06-15     6       67  2     B 2010-08-15     7      138  2     B 2010-09-15     8      21

Note We have assumed the times are ordered within each group (as in the question). If that is not so then sort df first.

2) sqldf

library(sqldf)sqldf("select id, class, a.t, a.'count', sum(b.'count') desired    from df a join df b    using(id, class)    where a.t - b.t between 0 and 100   group by id, class, a.t")

which gives:

  id class          t count desired1  1     A 2010-01-15     1       12  1     A 2010-02-15     2       33  1     B 2010-04-15     3       34  1     B 2010-09-15     4       45  2     A 2010-01-15     5       56  2     B 2010-06-15     6       67  2     B 2010-08-15     7      138  2     B 2010-09-15     8      21

Note: If the merge should be too large to fit into memory then use sqldf("...", dbname = tempfile()) to cause the intermediate results to be stored in a database which it creates on the fly and automatically destroys afterwards.

3) Base R The sqldf solution motivates this base R solution which just translates the SQL into R:

m <- merge(df, df, by = 1:2)s <- subset(m, t.x - t.y >= 0 & t.x - t.y <= 100)ag <- aggregate(count.y ~ t.x + class + id, s, sum)names(ag) <- c("t", "class", "id", "count", "desired")

The result is:

> ag           t class id count desired1 2010-01-15     A  1     1       12 2010-02-15     A  1     2       33 2010-04-15     B  1     3       34 2010-09-15     B  1     4       45 2010-01-15     A  2     5       56 2010-06-15     B  2     6       67 2010-08-15     B  2     7      138 2010-09-15     B  2     8      21

Note: This does do a merge in memory which might be a problem if the data set is very large.

UPDATE: Minor simplifications of first solution and also added second solution.

UPDATE 2: Added third solution.

r sas plyr zoo

I'm almost embarrassed to post this. I'm usually pretty good as these, but there's got to be a better way.

This first uses zoo's as.yearmon to get the dates in terms of just month and year, then reshapes it to get one column for each id/class combination, then fills in with zeros before, after, and for missing months, then uses zoo to get the rolling sum, then pulls out just the desired months and merges back with the original data frame.

library(reshape2)library(zoo)df$yearmon <- as.yearmon(df$t)dfa <- dcast(id + class ~ yearmon, data=df, value.var="count")ida <- dfa[,1:2]dfa <- t(as.matrix(dfa[,-c(1:2)]))months <- with(df, seq(min(yearmon)-3/12, max(yearmon)+3/12, by=1/12))dfb <- array(dim=c(length(months), ncol(dfa)),              dimnames=list(paste(months), colnames(dfa)))dfb[rownames(dfa),] <- dfadfb[is.na(dfb)] <- 0dfb <- rollsumr(dfb,4, fill=0)rownames(dfb) <- paste(months)dfb <- dfb[rownames(dfa),]dfc <- cbind(ida, t(dfb))dfc <- melt(dfc, id.vars=c("class", "id"))names(dfc)[3:4] <- c("yearmon", "desired2")dfc$yearmon <- as.yearmon(dfc$yearmon)out <- merge(df,dfc)> out  id class  yearmon          t count desired desired21  1     A Feb 2010 2010-02-15     2       3        32  1     A Jan 2010 2010-01-15     1       1        13  1     B Apr 2010 2010-04-15     3       3        34  1     B Sep 2010 2010-09-15     4       4        45  2     A Jan 2010 2010-01-15     5       5        56  2     B Aug 2010 2010-08-15     7      13       137  2     B Jun 2010 2010-06-15     6       6        68  2     B Sep 2010 2010-09-15     8      21       21

r sas plyr zoo

A farily efficient answer to this problem could be found using the data.table library.

##Utilize the data.table packagelibrary("data.table")data <- data.table(t,class,id,count,desired)[order(id,class)]##Assign each customer an IDdata[,Cust_No:=.GRP,by=c("id","class")]##Create "list" of comparison dates and valuesRef <- data[,list(Compare_Value=list(I(count)),Compare_Date=list(I(t))), by=c("id","class")]##Compare two lists and see of the compare date is within N daysdata$Roll.Val <- mapply(FUN = function(RD, NUM) {  d <- as.numeric(Ref$Compare_Date[[NUM]] - RD)  sum((d <= 0 & d >= -124)*Ref$Compare_Value[[NUM]])}, RD = data$t,NUM=data$Cust_No)##Print out datadata <- data[,list(id,class,t,count,desired,Roll.Val)][order(id,class)]dataid class          t count desired Roll.Val1:  1     A 2010-01-15     1       1        12:  1     A 2010-02-15     2       3        33:  1     B 2010-04-15     3       3        34:  1     B 2010-09-15     4       4        45:  2     A 2010-01-15     5       5        56:  2     B 2010-06-15     6       6        67:  2     B 2010-08-15     7      13       138:  2     B 2010-09-15     8      21       21

CodeHunter

Compute rolling sum by id variables, with missing timepoints

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last