Geocode batch addresses in R with open mapquestapi Geocode batch addresses in R with open mapquestapi r r

Geocode batch addresses in R with open mapquestapi


You might need to vectorize your geocode_attempt function to do it columnwise:

vecGeoCode<-Vectorize(geocode_attempt,vectorize.args = c('address'))

And then call:

df %>%        mutate(lat = vecGeoCode(paste(street, postcode, city, country, sep=","))[1,],               lon =vecGeoCode(paste(street, postcode, city, country, sep=","))[2,])

To speed thing up, you might want to look at the batch mode of the API to get up to 100 lats and longs in one go.

To use the API's batch requests you could use this function:

geocodeBatch_attempt <- function(address) {  #URL for batch requests  URL=paste("http://open.mapquestapi.com/geocoding/v1/batch?key=", "Fmjtd%7Cluub2huanl%2C20%3Do5-9uzwdz",              "&location=", paste(address,collapse="&location="),sep = "")   URL <- gsub(" ", "+", URL)  data<-getURL(URL)  data <- fromJSON(data)  p<-sapply(data$results,function(x){    if(length(x$locations)==0){      c(NA,NA)    } else{      c(x$locations[[1]]$displayLatLng$lat, x$locations[[1]]$displayLatLng$lng)       }})  return(t(p))}

To test it:

#make a bigger df from the data (repeat the 5 lines 25 times)biggerDf<-df[rep(row.names(df), 25), ]#add a reqId column to split the data in batches of 100 requests biggerDf$reqId<-seq_along(biggerDf$id)%/%100#run the function, first grouping by reqId to send batches of 100 requestsbiggerDf %>%  group_by(reqId) %>%  mutate(lat = geocodeBatch_attempt(paste(street, postcode, city, country, sep=","))[,1],         lon =geocodeBatch_attempt(paste(street, postcode, city, country, sep=","))[,2])


It's really easy to look at mutate() and draw the conclusion that what's happening is similar to what you illustrate in your for loop - but what you're actually seeing there is just a vectorized R function which acting on the entire column of the data frame.

I would not be surprised if others had this misconception - the dplyr tutorials don't address the distinction between vectorized/non-vectorized functions, and (even more dangerous) R's recycling rules mean that applying a scalar function won't necessarily raise an error. There's some more discussion of this here.

One option is to rewrite your geocode_attempt so that it can take a vector of addresses.

If you want to keep your function as is, but want dplyr to behave more like something from the -ply family you have two potential approaches:

The first is to use the grouping variable you have in your data:

df %>%  group_by(id) %>%  mutate(    lat = geocode_attempt(paste(street, postcode, city, country, sep=","))[1],    lon = geocode_attempt(paste(street, postcode, city, country, sep=","))[2])

The second is to use rowwise() function described in this answer.

df %>%  rowwise() %>%  mutate(    lat = geocode_attempt(paste(street, postcode, city, country, sep=","))[1],    lon = geocode_attempt(paste(street, postcode, city, country, sep=","))[2])

The group_by solution is significantly faster on my machine. Not sure why!

Unfortunately the speed savings you are seeing from dplyr above are likely somewhat illusory - most likely the result of the geocoding function getting called only once (vs once per row in the loop). There may well be gains, but you'll need to run the timmings again.


There's a geocoding package using Nokia HERE service. It has a batch mode. You can use it with the test API keys and you may not hit a limit. Worth a look...