decode tinyurl in R to get full url path?
Below is a quick and dirty solution, but should get the job done:
library(RCurl)decode.short.url <- function(u) { x <- try( getURL(u, header = TRUE, nobody = TRUE, followlocation = FALSE) ) if(class(x) == 'try-error') { return(u) } else { x <- strsplit(x, "Location: ")[[1]][2] return(strsplit(x, "\r")[[1]][1]) }}
The variable 'u' below contains one shortend url, and one regular url.
u <- c("http://tinyurl.com/adcd", "http://www.google.com")
You can then get the expanded results by doing the following.
sapply(u, decode.short.url)
The above should work for most services which shorten the URL, not just tinyURL. I think.
HTH
Tony Breyal
I used Tony Breyal's code, but the function returned NA
values for those URLs where there was no URL redirection. Even though Tony listed "google.com" in his example, I think Google redirects you in any case to some sort of localized version of google.com.
Here is how I modified Tony's code to deal with that:
decode.short.url <- function(u) { x <- try( getURL(u, header = TRUE, nobody = TRUE, followlocation = FALSE) ) if(class(x) == 'try-error') { print(paste("***", u, "--> ERORR!!!!")) return(u) } else { x <- strsplit(x, "Location: ")[[1]][2] x.2 <- strsplit(x, "\r")[[1]][1] if (is.na(x.2)){ print(paste("***", u, "--> No change.")) return(u) }else{ print(paste("***", x.2, "--> resolved in -->", x.2)) return(x.2) } }}u <- list("http://www.amazon.com", "http://tinyurl.com/adcd") urls <- sapply(u, decode.short.url)