R - curl - download remote file only when changed R - curl - download remote file only when changed curl curl

R - curl - download remote file only when changed


You'll have to keep a history of last-modified dates of the files (assuming the web server is consistent in reporting that) and check that with httr::HEAD() before downloading (i.e. you have some work to do vis a vis storing that last-modified value somewhere, probably in a data frame with the URL):

library(httr)URL <- "http://www.pcr.uu.se/digitalAssets/124/124932_1ucdponesided2015.rdata"#' Download a file only if it hasn't changed since \code{last_modified}#' #' @param URL url of file#' @param fil path to write file#' @param last_modified \code{POSIXct}. Ideally, the output from the first #'        successful run of \code{get_file()}#' @param overwrite overwrite the file if it exists?#' @param .verbose output a message if the file was unchanged?get_file <- function(URL, fil, last_modified=NULL, overwrite=TRUE, .verbose=TRUE) {  if ((!file.exists(fil)) || is.null(last_modified)) {    res <- GET(URL, write_disk(fil, overwrite))    return(httr::parse_http_date(res$headers$`last-modified`))  } else if (inherits(last_modified, "POSIXct")) {    res <- HEAD(URL)    cur_last_mod <- httr::parse_http_date(res$headers$`last-modified`)    if (cur_last_mod != last_modified) {      res <- GET(URL, write_disk(fil, overwrite))      return(httr::parse_http_date(res$headers$`last-modified`))    }    if (.verbose) message(sprintf("'%s' unchanged since %s", URL, last_modified))    return(last_modified)  } }# first run == you don't know the last-modified date.# you need to pair this with the URL in some data structure for later use.last_mod <- get_file(URL, basename(URL))class(last_mod)## [1] "POSIXct" "POSIXt"last_mod## [1] "2015-11-16 17:34:06 GMT"last_mod <- get_file(URL, basename(URL), last_mod)#> 'http://www.pcr.uu.se/digitalAssets/124/124932_1ucdponesided2015.rdata' unchanged since 2015-11-16 17:34:06


An alternative to the httr package is the base function base::curlGetHeaders(url), but you'll still need to parse the last modified date yourself!