Using dplyr for frequency counts of interactions, must include zero counts
Here's a simple option, using data.table
instead:
library(data.table)dt = as.data.table(your_df)setkey(dt, id, date)# in versions 1.9.3+dt[CJ(unique(id), unique(date)), .N, by = .EACHI]# id date N# 1: Andrew13 2006-08-03 0# 2: Andrew13 2007-09-11 1# 3: Andrew13 2008-06-12 0# 4: Andrew13 2008-10-11 0# 5: Andrew13 2009-07-03 0# 6: John12 2006-08-03 1# 7: John12 2007-09-11 0# 8: John12 2008-06-12 0# 9: John12 2008-10-11 0#10: John12 2009-07-03 0#11: Lisa825 2006-08-03 0#12: Lisa825 2007-09-11 0#13: Lisa825 2008-06-12 0#14: Lisa825 2008-10-11 0#15: Lisa825 2009-07-03 1#16: Tom2993 2006-08-03 0#17: Tom2993 2007-09-11 0#18: Tom2993 2008-06-12 1#19: Tom2993 2008-10-11 1#20: Tom2993 2009-07-03 0
In versions 1.9.2 or before the equivalent expression omits the explicit by
:
dt[CJ(unique(id), unique(date)), .N]
The idea is to create all possible pairs of id
and date
(which is what the CJ
part does), and then merge it back, counting occurrences.
This is how you could do it, although I use dplyr
only in part to calculate the frequencies in your original df and for the left_join. As you already suggested in your question, I created a new data.frame and merged it with the existing. I guess if you want to do it exclusively in dplyr
that would require you to somehow rbind
many rows in the process and I assume this way might be faster than the other.
require(dplyr)original <- read.table(header=T,text=" id dateJohn12 2006-08-03Tom2993 2008-10-11Lisa825 2009-07-03Tom2993 2008-06-12Andrew13 2007-09-11", stringsAsFactors=F)original$date <- as.Date(original$date) #convert to date#get the frequency in original data in new column and summarize in a single row per grouporiginal <- original %>% group_by(id, date) %>% summarize(count = n()) #create a sequence of date as you need itdates <- seq(as.Date("2006-01-01"), as.Date("2009-12-31"), 1) #create a new df with expand.grid to get all combinations of date/idnewdf <- expand.grid(id = original$id, date = dates) #remove datesrm(dates)#join original and newdf to have the frequency counts from original dfnewdf <- left_join(newdf, original, by=c("id","date")) #replace all NA with 0 for rows which were not in original dfnewdf$count[is.na(newdf$count)] <- 0