Simultaneously merge multiple data.frames in a list

r list merge dataframe r-faq

Another question asked specifically how to perform multiple left joins using dplyr in R . The question was marked as a duplicate of this one so I answer here, using the 3 sample data frames below:

x <- data.frame(i = c("a","b","c"), j = 1:3, stringsAsFactors=FALSE)y <- data.frame(i = c("b","c","d"), k = 4:6, stringsAsFactors=FALSE)z <- data.frame(i = c("c","d","a"), l = 7:9, stringsAsFactors=FALSE)

Update June 2018: I divided the answer in three sections representing three different ways to perform the merge. You probably want to use the purrr way if you are already using the tidyverse packages. For comparison purposes below, you'll find a base R version using the same sample dataset.

1) Join them with reduce from the purrr package:

The purrr package provides a reduce function which has a concise syntax:

library(tidyverse)list(x, y, z) %>% reduce(left_join, by = "i")#  A tibble: 3 x 4#  i       j     k     l#  <chr> <int> <int> <int># 1 a      1    NA     9# 2 b      2     4    NA# 3 c      3     5     7

You can also perform other joins, such as a full_join or inner_join:

list(x, y, z) %>% reduce(full_join, by = "i")# A tibble: 4 x 4# i       j     k     l# <chr> <int> <int> <int># 1 a     1     NA     9# 2 b     2     4      NA# 3 c     3     5      7# 4 d     NA    6      8list(x, y, z) %>% reduce(inner_join, by = "i")# A tibble: 1 x 4# i       j     k     l# <chr> <int> <int> <int># 1 c     3     5     7

2) dplyr::left_join() with base R Reduce():

list(x,y,z) %>%    Reduce(function(dtf1,dtf2) left_join(dtf1,dtf2,by="i"), .)#   i j  k  l# 1 a 1 NA  9# 2 b 2  4 NA# 3 c 3  5  7

3) Base R merge() with base R Reduce():

And for comparison purposes, here is a base R version of the left join based on Charles's answer.

 Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),        list(x,y,z))#   i j  k  l# 1 a 1 NA  9# 2 b 2  4 NA# 3 c 3  5  7

r list merge dataframe r-faq

Reduce makes this fairly easy:

merged.data.frame = Reduce(function(...) merge(..., all=T), list.of.data.frames)

Here's a fully example using some mock data:

set.seed(1)list.of.data.frames = list(data.frame(x=1:10, a=1:10), data.frame(x=5:14, b=11:20), data.frame(x=sample(20, 10), y=runif(10)))merged.data.frame = Reduce(function(...) merge(..., all=T), list.of.data.frames)tail(merged.data.frame)#    x  a  b         y#12 12 NA 18        NA#13 13 NA 19        NA#14 14 NA 20 0.4976992#15 15 NA NA 0.7176185#16 16 NA NA 0.3841037#17 19 NA NA 0.3800352

And here's an example using these data to replicate my.list:

merged.data.frame = Reduce(function(...) merge(..., by=match.by, all=T), my.list)merged.data.frame[, 1:12]#  matchname party st district chamber senate1993 name.x v2.x v3.x v4.x senate1994 name.y#1   ALGIERE   200 RI      026       S         NA   <NA>   NA   NA   NA         NA   <NA>#2     ALVES   100 RI      019       S         NA   <NA>   NA   NA   NA         NA   <NA>#3    BADEAU   100 RI      032       S         NA   <NA>   NA   NA   NA         NA   <NA>

Note: It looks like this is arguably a bug in merge. The problem is there is no check that adding the suffixes (to handle overlapping non-matching names) actually makes them unique. At a certain point it uses [.data.frame which does make.unique the names, causing the rbind to fail.

# first merge will end up with 'name.x' & 'name.y'merge(my.list[[1]], my.list[[2]], by=match.by, all=T)# [1] matchname    party        st           district     chamber      senate1993   name.x      # [8] votes.year.x senate1994   name.y       votes.year.y#<0 rows> (or 0-length row.names)# as there is no clash, we retain 'name.x' & 'name.y' and get 'name' againmerge(merge(my.list[[1]], my.list[[2]], by=match.by, all=T), my.list[[3]], by=match.by, all=T)# [1] matchname    party        st           district     chamber      senate1993   name.x      # [8] votes.year.x senate1994   name.y       votes.year.y senate1995   name         votes.year  #<0 rows> (or 0-length row.names)# the next merge will fail as 'name' will get renamed to a pre-existing field.

Easiest way to fix is to not leave the field renaming for duplicates fields (of which there are many here) up to merge. Eg:

my.list2 = Map(function(x, i) setNames(x, ifelse(names(x) %in% match.by,      names(x), sprintf('%s.%d', names(x), i))), my.list, seq_along(my.list))

The merge/Reduce will then work fine.

r list merge dataframe r-faq

You can do it using merge_all in the reshape package. You can pass parameters to merge using the ... argument

reshape::merge_all(list_of_dataframes, ...)

Here is an excellent resource on different methods to merge data frames.

CodeHunter

Simultaneously merge multiple data.frames in a list

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last