Using dplyr for frequency counts of interactions, must include zero counts Using dplyr for frequency counts of interactions, must include zero counts r r

Using dplyr for frequency counts of interactions, must include zero counts


Here's a simple option, using data.table instead:

library(data.table)dt = as.data.table(your_df)setkey(dt, id, date)# in versions 1.9.3+dt[CJ(unique(id), unique(date)), .N, by = .EACHI]#          id       date N# 1: Andrew13 2006-08-03 0# 2: Andrew13 2007-09-11 1# 3: Andrew13 2008-06-12 0# 4: Andrew13 2008-10-11 0# 5: Andrew13 2009-07-03 0# 6:   John12 2006-08-03 1# 7:   John12 2007-09-11 0# 8:   John12 2008-06-12 0# 9:   John12 2008-10-11 0#10:   John12 2009-07-03 0#11:  Lisa825 2006-08-03 0#12:  Lisa825 2007-09-11 0#13:  Lisa825 2008-06-12 0#14:  Lisa825 2008-10-11 0#15:  Lisa825 2009-07-03 1#16:  Tom2993 2006-08-03 0#17:  Tom2993 2007-09-11 0#18:  Tom2993 2008-06-12 1#19:  Tom2993 2008-10-11 1#20:  Tom2993 2009-07-03 0

In versions 1.9.2 or before the equivalent expression omits the explicit by:

dt[CJ(unique(id), unique(date)), .N]

The idea is to create all possible pairs of id and date (which is what the CJ part does), and then merge it back, counting occurrences.


This is how you could do it, although I use dplyr only in part to calculate the frequencies in your original df and for the left_join. As you already suggested in your question, I created a new data.frame and merged it with the existing. I guess if you want to do it exclusively in dplyr that would require you to somehow rbind many rows in the process and I assume this way might be faster than the other.

require(dplyr)original <- read.table(header=T,text="    id         dateJohn12     2006-08-03Tom2993    2008-10-11Lisa825    2009-07-03Tom2993    2008-06-12Andrew13   2007-09-11", stringsAsFactors=F)original$date <- as.Date(original$date) #convert to date#get the frequency in original data in new column and summarize in a single row per grouporiginal <- original %>%  group_by(id, date) %>%  summarize(count = n())            #create a sequence of date as you need itdates <- seq(as.Date("2006-01-01"), as.Date("2009-12-31"), 1)    #create a new df with expand.grid to get all combinations of date/idnewdf <- expand.grid(id = original$id, date = dates)     #remove datesrm(dates)#join original and newdf to have the frequency counts from original dfnewdf <- left_join(newdf, original, by=c("id","date"))   #replace all NA with 0 for rows which were not in original dfnewdf$count[is.na(newdf$count)] <- 0