How to pass data.frame for UPDATE with R DBI How to pass data.frame for UPDATE with R DBI database database

How to pass data.frame for UPDATE with R DBI


Really late, my answer, but maybe still helpful...

There is no single function (I know) in the DBI/odbc package but you can replicate the update behavior using a prepared update statement (which should work faster than RODBC's sqlUpdate since it sends the parameter values as a batch to the SQL server:

library(DBI)library(odbc)con <- dbConnect(odbc::odbc(), driver="{SQL Server Native Client 11.0}", server="dbserver.domain.com\\default,1234", Trusted_Connection = "yes", database = "test")  # assumes Microsoft SQL ServerdbWriteTable(con, "iris", iris, row.names = TRUE)      # create and populate a table (adding the row names as a separate columns used as row ID)update <- dbSendQuery(con, 'update iris set "Sepal.Length"=?, "Sepal.Width"=?, "Petal.Length"=?, "Petal.Width"=?, "Species"=? WHERE row_names=?')# create a modified version of `iris`iris2 <- irisiris2$Sepal.Length <- 5iris2$Petal.Width[2] <- 1iris2$row_names <- rownames(iris)  # use the row names as unique row IDdbBind(update, iris2)  # send the updated datadbClearResult(update)  # release the prepared statement# now read the modified data - you will see the updates did workdata1 <- dbReadTable(con, "iris")dbDisconnect(con)

This works only if you have a primary key which I created in the above example by using the row names which are a unique number increased by one for each row...

For more information about the odbc package I have used in the DBI dbConnect statement see: https://github.com/rstats-db/odbc


Building on R Yoda's answer, I made myself the helper function below. This allows using a dataframe to specify update conditions.

While I built this to run transaction updates (i.e. single rows), it can in theory update multiple rows passing a condition. However, that's not the same as updating multiple rows using an input dataframe. Maybe somebody else can build on this...

dbUpdateCustom = function(x, key_cols, con, schema_name, table_name) {    if (nrow(x) != 1) stop("Input dataframe must be exactly 1 row")  if (!all(key_cols %in% colnames(x))) stop("All columns specified in 'key_cols' must be present in 'x'")    # Build the update string --------------------------------------------------  df_key     <- dplyr::select(x,  one_of(key_cols))  df_upt     <- dplyr::select(x, -one_of(key_cols))    set_str    <- purrr::map_chr(colnames(df_upt), ~glue::glue_sql('{`.x`} = {x[[.x]]}', .con = con))  set_str    <- paste(set_str, collapse = ", ")    where_str  <- purrr::map_chr(colnames(df_key), ~glue::glue_sql("{`.x`} = {x[[.x]]}", .con = con))  where_str  <- paste(where_str, collapse = " AND ")    update_str <- glue::glue('UPDATE {schema_name}.{table_name} SET {set_str} WHERE {where_str}')    # Execute ------------------------------------------------------------------    query_res <- DBI::dbSendQuery(con, update_str)  DBI::dbClearResult(query_res)  return (invisible(TRUE))}

Where

  • x: 1-row dataframe that contains 1+ key columns, and 1+ update columns.
  • key_cols: character vector, of 1 or more column names that are the keys (i.e. used in the WHERE clause)


Here is a little helper function I put together using REPLACE INTO to update a table using DBI, replacing old duplicate entries with the new values. It's basic and for my own needs, but should be easy to modify. All you need to pass to the function is the connection, table name, and dataframe. Note that the table must have a PRIMARY KEY column. I've also included a simple working example.

row_to_list <- function(Y)  suppressWarnings(split(Y, f = row(Y)))sql_val <- function(y){  if(!is.numeric(y)){    return(paste0("'",y,"'"))  }else{    if(is.na(y)){      return("NULL")    }else{      return(as.character(y))    }  }}to_sql_row <- function(x) paste0("(",paste(do.call("c", lapply(x, FUN = sql_val)), collapse = ", "),")")bracket <- function(x) paste0("`",x,"`")to_sql_string <- function(x) paste0("(",paste(sapply(x, FUN = bracket), collapse = ", "),")")replace_into_table <- function(con, table_name, new_data){  #new_data <- data.table(new_data)  cols <- to_sql_string(names(new_data))  vals <- paste(lapply(row_to_list(new_data), FUN = to_sql_row), collapse = ", ")  query <- paste("REPLACE INTO", table_name, cols, "VALUES", vals)  rs <- dbExecute(con, query)  return(rs)}tb <- data.frame("id" = letters[1:20], "A" = 1:20, "B" = seq(.1,2,.1)) # sample datadbWriteTable(con, "test_table", tb) # create tabledbExecute(con, "ALTER TABLE test_table ADD PRIMARY KEY (id)") # set primary keynew_data <- data.frame("id" = letters[19:23], "A" = 1:5, "B" = seq(101,105)) # new datanew_data[4,2] <- NA # add some NA valuesnew_data[5,3] <- NAtable_name <- "test_table"replace_into_table(con, "test_table", new_data)result <- dbReadTable(con, "test_table")