How to extract JSON output to dataframe?
I prefer to use jsonlite
for parsing JSON in R.
To parse the nested JSON list, you can do the fromJSON
call inside an lapply
.
jsonlite::fromJSON
tries to simplify the result for you. But, as JSON is designed to handle nested data structures, you're often returned a data.frame of lists, so to get the data.frame
you're after, you need to know which element of the list you want, and extract it.
For example
library(RDSTK)library(jsonlite)js <- coordinates2politics(Coords$Latitude, Coords$Longitude)lst <- lapply(js, jsonlite::fromJSON)lst[[1]]$politics# type friendly_type name code# 1 admin2 country Canada can# 2 admin4 state Ontario ca08# 3 constituency constituency Hamilton Centre 35031# 4 constituency constituency Burlington 35010# 5 constituency constituency Hamilton East-Stoney Creek 35032
To get the data.frame
, you can construct another lapply
to extract the elements you want, and then put it altogether with either a do.call(..., rbind)
, or my preferences is with data.table::rbindlist(...)
lst_result <- lapply(lst, function(x){ df <- x$politics[[1]] df$lat <- x$location$latitude df$lon <- x$location$longitude return(df)})data.table::rbindlist(lst_result)# type friendly_type name code lat lon# 1: admin2 country Canada can 43.30528 -79.80306# 2: admin4 state Ontario ca08 43.30528 -79.80306# 3: constituency constituency Hamilton Centre 35031 43.30528 -79.80306# 4: constituency constituency Burlington 35010 43.30528 -79.80306# 5: constituency constituency Hamilton East-Stoney Creek 35032 43.30528 -79.80306# 6: admin2 country Canada can 46.08333 -82.41667# 7: admin4 state Ontario ca08 46.08333 -82.41667
Alternatively, to get more detail about each lat/lon you can use Google's API through library(googleway)
(Disclaimer: I wrote googleway) to reverse geocode the lat/lons.
For this you need a valid Google API key (which is limited to 2,500 requests per day, unless you pay)
library(googleway)key <- "your_api_key"lst <- apply(Coords, 1, function(x){ google_reverse_geocode(location = c(x["Latitude"], x["Longitude"]), key = key)})lst[[1]]$results$address_components# [[1]]# long_name short_name types# 1 Burlington Bay James N. Allan Skyway Burlington Bay James N. Allan Skyway route# 2 Burlington Burlington locality, political# 3 Halton Regional Municipality Halton Regional Municipality administrative_area_level_2, political# 4 Ontario ON administrative_area_level_1, political# 5 Canada CA country, political# 6 L7S L7S postal_code, postal_code_prefix
Or similarly through library(ggmap)
, also limited by Google's 2,500 limit.
library(ggmap)apply(Coords, 1, function(x){ revgeocode(c(x["Longitude"], x["Latitude"]))})# 1 # "Burlington Bay James N. Allan Skyway, Burlington, ON L7S, Canada" # 2 # "308 Brennan Harbour Rd, Spanish, ON P0P 2A0, Canada" # 3 # "724 Harris Ave, San Diego, CA 92154, USA" # 4 # "30 Cherry St, Chinook, WA 98614, USA" # 5 # "St James Township, MI, USA" # 6 # "US-101, Chinook, WA 98614, USA" # 7 # "2413 II Rd, Garden, MI 49835, USA" # 8 # "2413 II Rd, Garden, MI 49835, USA" # 9 # "8925 S Shore Rd, Stella, ON K0H 2S0, Canada" # 10 # "Charlevoix County, MI, USA"
That json-list needs to be extracted. You really only have a result from your first coordinate:
sapply(json_file[[1]]$politics, "[[", 'name')[ # now pick correct names with logical sapply(json_file[[1]]$politics, "[[", 'friendly_type') %in% c("country","state") ] [1] "Canada" "Ontario"
You should have used apply
to run all the coordinates one-by-one through the fromJSON(coordinates2politics( .,.)
extraction since the function appears not to be "vectorized".
res=apply( Coords, 1, function(x) {fromJSON(coordinates2politics(x['Latitude'], x['Longitude']) )} )sapply( res, function(x) sapply(x[[1]]$politics, "[[", 'name')[ sapply(x[[1]]$politics, "[[", 'friendly_type') %in% c("country","state")] )$`1`[1] "Canada" "Ontario"$`2`[1] "Canada" "Ontario"$`3`[1] "United States" "California" "Mexico" "California" $`4`[1] "United States"$`5`[1] "United States" "Michigan" $`6`[1] "United States" "Washington" $`7`[1] "United States" "Michigan" $`8`[1] "United States" "Michigan" $`9`[1] "Canada" "Ontario"$`10`[1] "United States" "Michigan"
Apparently items near the border (like San Diego County or Chula Vista) will give ambiguous results.