Combine a list of data frames into one data frame by row Combine a list of data frames into one data frame by row r r

Combine a list of data frames into one data frame by row


Use bind_rows() from the dplyr package:

bind_rows(list_of_dataframes, .id = "column_label")


One other option is to use a plyr function:

df <- ldply(listOfDataFrames, data.frame)

This is a little slower than the original:

> system.time({ df <- do.call("rbind", listOfDataFrames) })   user  system elapsed    0.25    0.00    0.25 > system.time({ df2 <- ldply(listOfDataFrames, data.frame) })   user  system elapsed    0.30    0.00    0.29> identical(df, df2)[1] TRUE

My guess is that using do.call("rbind", ...) is going to be the fastest approach that you will find unless you can do something like (a) use a matrices instead of a data.frames and (b) preallocate the final matrix and assign to it rather than growing it.

Edit 1:

Based on Hadley's comment, here's the latest version of rbind.fill from CRAN:

> system.time({ df3 <- rbind.fill(listOfDataFrames) })   user  system elapsed    0.24    0.00    0.23 > identical(df, df3)[1] TRUE

This is easier than rbind, and marginally faster (these timings hold up over multiple runs). And as far as I understand it, the version of plyr on github is even faster than this.


For the purpose of completeness, I thought the answers to this question required an update. "My guess is that using do.call("rbind", ...) is going to be the fastest approach that you will find..." It was probably true for May 2010 and some time after, but in about Sep 2011 a new function rbindlist was introduced in the data.table package version 1.8.2, with a remark that "This does the same as do.call("rbind",l), but much faster". How much faster?

library(rbenchmark)benchmark(  do.call = do.call("rbind", listOfDataFrames),  plyr_rbind.fill = plyr::rbind.fill(listOfDataFrames),   plyr_ldply = plyr::ldply(listOfDataFrames, data.frame),  data.table_rbindlist = as.data.frame(data.table::rbindlist(listOfDataFrames)),  replications = 100, order = "relative",   columns=c('test','replications', 'elapsed','relative')  ) 

                  test replications elapsed relative4 data.table_rbindlist          100    0.11    1.0001              do.call          100    9.39   85.3642      plyr_rbind.fill          100   12.08  109.8183           plyr_ldply          100   15.14  137.636