Fast vectorized merge of list of data.frames by row

Try this:

bind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE))nr <- nrow(sample.list[[1]])lapply(1:nr, bind.ith.rows)

r performance list merge dataframe

A couple of solutions that will make this quicker using data.table

EDIT - with larger dataset showing data.table awesomeness even more.

# here are some sample data sample.list <- replicate(10000, data.frame(x = sample(1:100, 10),   y = sample(1:100, 10), capt = sample(0:1, 10, replace = TRUE)), simplify = F)

Gabor's fast solution:

# Solution Gaborbind.ith.rows <- function(i) do.call(rbind, lapply(sample.list, "[", i, TRUE))nr <- nrow(sample.list[[1]])system.time(rowbound <- lapply(1:nr, bind.ith.rows))##    user  system elapsed ##   25.87    0.01   25.92

The data.table function rbindlist will make this even quicker even when working with data.frames)

library(data.table)fastbind.ith.rows <- function(i) rbindlist(lapply(sample.list, "[", i, TRUE))system.time(fastbound <- lapply(1:nr, fastbind.ith.rows))##    user  system elapsed ##   13.89    0.00   13.89

A `data.table` solution

Here is a solution that uses data.tables - it is split solution on steroids.

# data.table solutionsystem.time({    # change each element of sample.list to a data.table (and data.frame) this    # is done instaneously by reference    invisible(lapply(sample.list, setattr, name = "class",                value = c("data.table",  "data.frame")))    # combine into a big data set    bigdata <- rbindlist(sample.list)    # add a row index column (by refere3nce)    index <- as.character(seq_len(nr))    bigdata[, `:=`(rowid, index)]    # set the key for binary searches    setkey(bigdata, rowid)    # split on this -    dt_list <- lapply(index, function(i, j, x) x[i = J(i)], x = bigdata)    # if you want to drop the `row id` column    invisible(lapply(dt_list, function(x) set(x, j = "rowid", value = NULL)))    # if you really don't want them to be data.tables run this line    # invisible(lapply(dt_list, setattr,name = 'class', value =    # c('data.frame')))})##################################    user  system elapsed    ####    0.08    0.00    0.08    ##################################

How awesome is data.table!

Caveat user with `rbindlist`

rbindlist is fast because it does not perform the checking that do.call(rbind,....) will. For example it assumes that any factor columns have the same levels as in the first element of the list.

r performance list merge dataframe

Here's my attempt with plyr, but I like G. Grothendieck's approach:

library(plyr)alply(do.call("cbind",sample.list), 1, .fun=matrix,        ncol=ncol(sample.list[[1]]), byrow=TRUE,        dimnames=list(1:length(sample.list),        names(sample.list[[1]])      ))

CodeHunter

Fast vectorized merge of list of data.frames by row

A `data.table` solution

Caveat user with `rbindlist`

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last

Fast vectorized merge of list of data.frames by row

A data.table solution

Caveat user with rbindlist

Recent Posts

A `data.table` solution

Caveat user with `rbindlist`