Efficiently counting non-NA elements in data.table

r data.table

Yes the option 3rd seems to be the best one. I've added another one which is valid only if you consider to change the key of your data.table from id to var, but still option 3 is the fastest on your data.

library(microbenchmark)library(data.table)dt<-data.table(id=(1:100)[sample(10,size=1e6,replace=T)],var=c(1,0,NA)[sample(3,size=1e6,replace=T)],key=c("var"))dt1 <- copy(dt)dt2 <- copy(dt)dt3 <- copy(dt)dt4 <- copy(dt)microbenchmark(times=10L,               dt1[!is.na(var),.N,by=id][,max(N,na.rm=T),by=id],               dt2[,length(var[!is.na(var)]),by=id],               dt3[,sum(!is.na(var)),by=id],               dt4[.(c(1,0)),.N,id,nomatch=0L])# Unit: milliseconds#                                                         expr      min       lq      mean    median        uq       max neval#  dt1[!is.na(var), .N, by = id][, max(N, na.rm = T), by = id] 95.14981 95.79291 105.18515 100.16742 112.02088 131.87403    10#                     dt2[, length(var[!is.na(var)]), by = id] 83.17203 85.91365  88.54663  86.93693  89.56223 100.57788    10#                             dt3[, sum(!is.na(var)), by = id] 45.99405 47.81774  50.65637  49.60966  51.77160  61.92701    10#                        dt4[.(c(1, 0)), .N, id, nomatch = 0L] 78.50544 80.95087  89.09415  89.47084  96.22914 100.55434    10

CodeHunter

Efficiently counting non-NA elements in data.table

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last