How to extract JSON output to dataframe? How to extract JSON output to dataframe? json json

How to extract JSON output to dataframe?


I prefer to use jsonlite for parsing JSON in R.

To parse the nested JSON list, you can do the fromJSON call inside an lapply.

jsonlite::fromJSON tries to simplify the result for you. But, as JSON is designed to handle nested data structures, you're often returned a data.frame of lists, so to get the data.frame you're after, you need to know which element of the list you want, and extract it.

For example

library(RDSTK)library(jsonlite)js <- coordinates2politics(Coords$Latitude, Coords$Longitude)lst <- lapply(js, jsonlite::fromJSON)lst[[1]]$politics#           type friendly_type                       name  code# 1       admin2       country                     Canada   can# 2       admin4         state                    Ontario  ca08# 3 constituency  constituency            Hamilton Centre 35031# 4 constituency  constituency                 Burlington 35010# 5 constituency  constituency Hamilton East-Stoney Creek 35032

To get the data.frame, you can construct another lapply to extract the elements you want, and then put it altogether with either a do.call(..., rbind), or my preferences is with data.table::rbindlist(...)

lst_result <- lapply(lst, function(x){    df <- x$politics[[1]]    df$lat <- x$location$latitude    df$lon <- x$location$longitude    return(df)})data.table::rbindlist(lst_result)#            type friendly_type                                  name                   code      lat        lon# 1:       admin2       country                                Canada                    can 43.30528  -79.80306# 2:       admin4         state                               Ontario                   ca08 43.30528  -79.80306# 3: constituency  constituency                       Hamilton Centre                  35031 43.30528  -79.80306# 4: constituency  constituency                            Burlington                  35010 43.30528  -79.80306# 5: constituency  constituency            Hamilton East-Stoney Creek                  35032 43.30528  -79.80306# 6:       admin2       country                                Canada                    can 46.08333  -82.41667# 7:       admin4         state                               Ontario                   ca08 46.08333  -82.41667

Alternatively, to get more detail about each lat/lon you can use Google's API through library(googleway) (Disclaimer: I wrote googleway) to reverse geocode the lat/lons.

For this you need a valid Google API key (which is limited to 2,500 requests per day, unless you pay)

library(googleway)key <- "your_api_key"lst <- apply(Coords, 1, function(x){    google_reverse_geocode(location = c(x["Latitude"], x["Longitude"]),                           key = key)})lst[[1]]$results$address_components# [[1]]#                              long_name                           short_name                                  types# 1 Burlington Bay James N. Allan Skyway Burlington Bay James N. Allan Skyway                                  route# 2                           Burlington                           Burlington                    locality, political# 3         Halton Regional Municipality         Halton Regional Municipality administrative_area_level_2, political# 4                              Ontario                                   ON administrative_area_level_1, political# 5                               Canada                                   CA                     country, political# 6                                  L7S                                  L7S        postal_code, postal_code_prefix

Or similarly through library(ggmap), also limited by Google's 2,500 limit.

library(ggmap)apply(Coords, 1, function(x){    revgeocode(c(x["Longitude"], x["Latitude"]))})# 1 # "Burlington Bay James N. Allan Skyway, Burlington, ON L7S, Canada" # 2 # "308 Brennan Harbour Rd, Spanish, ON P0P 2A0, Canada" # 3 # "724 Harris Ave, San Diego, CA 92154, USA" # 4 # "30 Cherry St, Chinook, WA 98614, USA" # 5 # "St James Township, MI, USA" # 6 # "US-101, Chinook, WA 98614, USA" # 7 # "2413 II Rd, Garden, MI 49835, USA" # 8 # "2413 II Rd, Garden, MI 49835, USA" # 9 # "8925 S Shore Rd, Stella, ON K0H 2S0, Canada" # 10 # "Charlevoix County, MI, USA"


That json-list needs to be extracted. You really only have a result from your first coordinate:

sapply(json_file[[1]]$politics, "[[", 'name')[ # now pick correct names with logical        sapply(json_file[[1]]$politics, "[[", 'friendly_type') %in% c("country","state") ] [1] "Canada"  "Ontario"

You should have used apply to run all the coordinates one-by-one through the fromJSON(coordinates2politics( .,.) extraction since the function appears not to be "vectorized".

res=apply( Coords, 1, function(x) {fromJSON(coordinates2politics(x['Latitude'],                                                                  x['Longitude']) )} )sapply( res, function(x) sapply(x[[1]]$politics, "[[", 'name')[                             sapply(x[[1]]$politics, "[[", 'friendly_type') %in%                                                                 c("country","state")] )$`1`[1] "Canada"  "Ontario"$`2`[1] "Canada"  "Ontario"$`3`[1] "United States" "California"    "Mexico"        "California"   $`4`[1] "United States"$`5`[1] "United States" "Michigan"     $`6`[1] "United States" "Washington"   $`7`[1] "United States" "Michigan"     $`8`[1] "United States" "Michigan"     $`9`[1] "Canada"  "Ontario"$`10`[1] "United States" "Michigan" 

Apparently items near the border (like San Diego County or Chula Vista) will give ambiguous results.