Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)
To calculate the geographic distance between two points with latitude/longitude coordinates, you can use several formula's. The package geosphere
has the distCosine
, distHaversine
, distVincentySphere
and distVincentyEllipsoid
for calculating the distance. Of these, the distVincentyEllipsoid
is considered the most accurate one, but is computationally more intensive than the other ones.
With one of these functions, you can make a distance matrix. Based on that matrix you can then assign locality
names based on shortest distance with which.min
and the corresponding distance with min
(see for this the last part of the answer) like this:
library(geosphere)# create distance matrixmat <- distm(list1[,c('longitude','latitude')], list2[,c('longitude','latitude')], fun=distVincentyEllipsoid)# assign the name to the point in list1 based on shortest distance in the matrixlist1$locality <- list2$locality[max.col(-mat)]
this gives:
> list1 longitude latitude locality1 80.15998 12.90524 D2 72.89125 19.08120 A3 77.65032 12.97238 C4 77.60599 12.90927 D5 72.88120 19.08225 A6 76.65460 12.81447 E7 72.88232 19.08241 A8 77.49186 13.00984 D9 72.82228 18.99347 A10 72.88871 19.07990 A
Another possibility is to assign the locality
based on the average longitude and latitude values of the locality
s in list2
:
library(dplyr)list2a <- list2 %>% group_by(locality) %>% summarise_each(funs(mean)) %>% ungroup()mat2 <- distm(list1[,c('longitude','latitude')], list2a[,c('longitude','latitude')], fun=distVincentyEllipsoid)list1 <- list1 %>% mutate(locality2 = list2a$locality[max.col(-mat2)])
or with data.table
:
library(data.table)list2a <- setDT(list2)[,lapply(.SD, mean), by=locality]mat2 <- distm(setDT(list1)[,.(longitude,latitude)], list2a[,.(longitude,latitude)], fun=distVincentyEllipsoid)list1[, locality2 := list2a$locality[max.col(-mat2)] ]
this gives:
> list1 longitude latitude locality locality21 80.15998 12.90524 D D2 72.89125 19.08120 A B3 77.65032 12.97238 C C4 77.60599 12.90927 D C5 72.88120 19.08225 A B6 76.65460 12.81447 E E7 72.88232 19.08241 A B8 77.49186 13.00984 D C9 72.82228 18.99347 A B10 72.88871 19.07990 A B
As you can see, this leads in most (7 out of 10) occasions to another assigned locality
.
You can add the distance with:
list1$near_dist <- apply(mat2, 1, min)
or another approach with max.col
(which is highly probable faster):
list1$near_dist <- mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)]# or using dplyrlist1 <- list1 %>% mutate(near_dist = mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)])# or using data.table (if not already a data.table, convert it with 'setDT(list1)' )list1[, near_dist := mat2[matrix(c(1:10, max.col(-mat2)), ncol = 2)] ]
the result:
> list1 longitude latitude locality locality2 near_dist 1: 80.15998 12.90524 D D 269966.8970 2: 72.89125 19.08120 A B 65820.2047 3: 77.65032 12.97238 C C 739.1885 4: 77.60599 12.90927 D C 9209.8165 5: 72.88120 19.08225 A B 66832.7223 6: 76.65460 12.81447 E E 0.0000 7: 72.88232 19.08241 A B 66732.3127 8: 77.49186 13.00984 D C 17855.3083 9: 72.82228 18.99347 A B 69456.338210: 72.88871 19.07990 A B 66004.9900
Credits to Martin Haringa for this solution on making this way easier when you need this function performed by traversing down a data frame on Mark Needham's blog
library(dplyr)library(geosphere)df %>% rowwise() %>% mutate(newcolumn_distance = distHaversine(c(df$long1, df$lat1), c(df$long2, df$lat2)))
I tested using the two functions distm and distHaversine separately on large samples from real world datasets, and distHaversine seems to come out far faster than the distm function. I'm surprised as I thought the two were merely the same function in two formats.
I add below a solution using the spatialrisk package. The key functions in this package are written in C++ (Rcpp), and are therefore very fast.
The function spatialrisk::points_in_circle() calculates the observations within radius from a center point. Note that distances are calculated using the Haversine formula. Since each element of the output is a data frame, purrr::map_dfr is used to row-bind them together:
ans <- purrr::map2_dfr(list1$longitude, list1$latitude, ~spatialrisk::points_in_circle(list2, .x, .y, lon = longitude, lat = latitude, radius = 2000000)[1,])cbind(list1, ans) longitude latitude longitude latitude locality distance_m1 80.15998 12.90524 77.76180 13.02212 D 260484.05912 72.89125 19.08120 72.89537 19.07726 A 616.63693 77.65032 12.97238 77.64214 13.00954 C 4230.72164 77.60599 12.90927 77.58415 12.92079 D 2694.45665 72.88120 19.08225 72.89537 19.07726 A 1590.87236 76.65460 12.81447 76.65460 12.81447 E 0.00007 72.88232 19.08241 72.89537 19.07726 A 1487.80288 77.49186 13.00984 77.58415 12.92079 D 14089.10519 72.82228 18.99347 72.89537 19.07726 A 12089.645410 72.88871 19.07990 72.89537 19.07726 A 759.8012