Writing a single column of R object into pre-existing postgres db table Writing a single column of R object into pre-existing postgres db table database database

Writing a single column of R object into pre-existing postgres db table


I'm assuming here you are using the CRAN library RPostgreSQL and that your question is,

"How can I update an existing postgresql record using R?"

If I interpreted your question correctly, I have good news and I'll make minor mods to your existing code to get it working below. Now, the good news:

  1. The error is in your SQL and
  2. You do not need doMC (nor foreach, for that matter).
  3. You're already connected to your database--that's usually the more onerous part!

If you really need parallelism you can see the basic format for initializing multiple workers with doSNOW. Either way, it is much easier to debug a single-threaded application so you should just straight up change your loop to a single-threaded loop by changing the %dopar% argument at the end of your foreach statement to %do% and then register your parallel backend AFTER you get the SQL working.

foreach(i = 1:length(choifnl$wearing),.inorder=FALSE,.packages="RPostgreSQL") %do% {  update(i)}

Then you might see that your SQL has a syntax error, notably in that "where" and your first "and" erroneously follow commas. I typically break large statements into more human readable form like below so it is easier to spot inconsistencies in form. I removed the inadvertent commas in this snippet below:

### SQL error resolvedupdate <- function(i) {  drv <- dbDriver("PostgreSQL")  con <- dbConnect(drv, dbname = "", host = "",port <-  , user = "", password = "") ##connect w/username and password  txt <- paste("UPDATE ucsd.sage_choi SET wearing=", choifnl$wearing[i],         ", weekday=", choifnl$weekday[i],         ", days=", choifnl$days[i],        "where participant_id=",choifnl$participant_id[i],        "AND date_id=", choifnl$date[i],        "AND gps_time=", choifnl$time[i])  dbGetQuery(con, txt)  dbDisconnect(con)}

From a performance standpoint, you are going to do much better if you initialize your connection outside the for loop because you do not need to sink the time and cost to re-establish the connection for each record.

### SQL error resolved, connection made outside loopupdate <- function(con,i) {  txt <- paste("UPDATE ucsd.sage_choi SET wearing=", choifnl$wearing[i],         ", weekday=", choifnl$weekday[i],         ", days=", choifnl$days[i],        "where participant_id=",choifnl$participant_id[i],        "AND date_id=", choifnl$date[i],        "AND gps_time=", choifnl$time[i])  dbSendQuery(con, txt) # edit 2}drv <- dbDriver("PostgreSQL")con <- dbConnect(drv, dbname = "", host = "",port <-  , user = "", password = "") ##connect w/username and passwordforeach(i = 1:length(choifnl$wearing), .inorder=FALSE,.packages="RPostgreSQL")%dopar%{  update(con,i) # passing in the open connection as an argument}dbDisconnect(con)


If I understood the question correctly, this is fairly easy using dplyr's ability to connect to databases

library(dplyr)library(RPostgreSQL)my_db <- src_postgres(host="<HOST>",                       user="<USER>",                      password="<PASS>",                      db = "ucsd")sage_choi <- tbl(my_db,"sage_choi")sage_choi %>%  select( participant_id, date_id, gps_time) %>%  left_join( choifnl, copy=TRUE, by=c("participant_id"="participant_id",                                      "date_id"="date",                                      "gps_time"="time")) %>%   compute(name="sage_choi2", temporary=FALSE)

Once you run the code, table sage_choi2 would contain the populated columns.