R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long r r

R - Finding closest neighboring point and number of neighbors within a given radius, coordinates lat-long


Best option is to use libraries sp and rgeos, which enable you to construct spatial classes and perform geoprocessing.

library(sp)library(rgeos)

Read the data and transform them to spatial objects:

mydata <- read.delim('d:/temp/testfile.txt', header=T)sp.mydata <- mydatacoordinates(sp.mydata) <- ~long+latclass(sp.mydata)[1] "SpatialPointsDataFrame"attr(,"package")[1] "sp"

Now calculate pairwise distances between points

d <- gDistance(sp.mydata, byid=T)

Find second shortest distance (closest distance is of point to itself, therefore use second shortest)

min.d <- apply(d, 1, function(x) order(x, decreasing=F)[2])

Construct new data frame with desired variables

newdata <- cbind(mydata, mydata[min.d,], apply(d, 1, function(x) sort(x, decreasing=F)[2]))colnames(newdata) <- c(colnames(mydata), 'neighbor', 'n.lat', 'n.long', 'n.area', 'n.canopy', 'n.avg.depth', 'distance')newdata            pond      lat      long area canopy avg.depth     neighbor    n.lat    n.long n.area n.canopy n.avg.depth6            A10 41.95928 -72.14605 1500     66  60.61538 Borrow.Pit.3 41.95546 -72.15375      0        0    29.222223          AA006 41.96431 -72.12100  250      0  57.77778   Blacksmith 41.95508 -72.12380    361       77    71.312502     Blacksmith 41.95508 -72.12380  361     77  71.31250        AA006 41.96431 -72.12100    250        0    57.777785   Borrow.Pit.1 41.95601 -72.15419    0      0  41.44444 Borrow.Pit.2 41.95571 -72.15413      0        0    37.700004   Borrow.Pit.2 41.95571 -72.15413    0      0  37.70000 Borrow.Pit.1 41.95601 -72.15419      0        0    41.444445.1 Borrow.Pit.3 41.95546 -72.15375    0      0  29.22222 Borrow.Pit.2 41.95571 -72.15413      0        0    37.700006.1      Boulder 41.91822 -72.14978 1392     98  43.53333 Borrow.Pit.3 41.95546 -72.15375      0        0    29.22222        distance6   0.00859548723   0.00964622772   0.00964622775   0.00030594124   0.00030594125.1 0.00045486266.1 0.0374480316

Edit: if coordinates are in degrees and you would like to calculate distance in kilometers, use package geosphere

library(geosphere)d <- distm(sp.mydata)# rest is the same

This should provide better results, if the points are scattered across the globe and coordinates are in degrees


I add below an alternative solution using the newer sf package, for those interested and coming to this page now (as I did).

First, load the data and create the sf object.

# Using sfmydata <- structure(  list(pond = c("A10", "AA006", "Blacksmith", "Borrow.Pit.1",                 "Borrow.Pit.2", "Borrow.Pit.3", "Boulder"),        lat = c(41.95928, 41.96431, 41.95508, 41.95601, 41.95571, 41.95546,                41.918223),        long = c(-72.14605, -72.121, -72.123803, -72.15419, -72.15413,                 -72.15375, -72.14978),        area = c(1500L, 250L, 361L, 0L, 0L, 0L, 1392L),        canopy = c(66L, 0L, 77L, 0L, 0L, 0L, 98L),        avg.depth = c(60.61538462, 57.77777778, 71.3125, 41.44444444,                      37.7, 29.22222222, 43.53333333)),   class = "data.frame", row.names = c(NA, -7L))library(sf)data_sf <- st_as_sf(mydata, coords = c("long", "lat"),                    # Change to your CRS                    crs = "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs")st_is_longlat(data_sf)

sf::st_distance calculates the distance matrix in meters using Great Circle distance when using lat/lon data.

dist.mat <- st_distance(data_sf) # Great Circle distance since in lat/lon# Number within 1.5km: Subtract 1 to exclude the point itselfnum.1500 <- apply(dist.mat, 1, function(x) {  sum(x < 1500) - 1})# Calculate nearest distancenn.dist <- apply(dist.mat, 1, function(x) {  return(sort(x, partial = 2)[2])})# Get index for nearest distancenn.index <- apply(dist.mat, 1, function(x) { order(x, decreasing=F)[2] })n.data <- mydatacolnames(n.data)[1] <- "neighbor"colnames(n.data)[2:ncol(n.data)] <-   paste0("n.", colnames(n.data)[2:ncol(n.data)])mydata2 <- data.frame(mydata,                      n.data[nn.index, ],                      n.distance = nn.dist,                      radius1500 = num.1500)rownames(mydata2) <- seq(nrow(mydata2))
mydata2          pond      lat      long area canopy avg.depth     neighbor    n.lat    n.long n.area n.canopy1          A10 41.95928 -72.14605 1500     66  60.61538 Borrow.Pit.1 41.95601 -72.15419      0        02        AA006 41.96431 -72.12100  250      0  57.77778   Blacksmith 41.95508 -72.12380    361       773   Blacksmith 41.95508 -72.12380  361     77  71.31250        AA006 41.96431 -72.12100    250        04 Borrow.Pit.1 41.95601 -72.15419    0      0  41.44444 Borrow.Pit.2 41.95571 -72.15413      0        05 Borrow.Pit.2 41.95571 -72.15413    0      0  37.70000 Borrow.Pit.1 41.95601 -72.15419      0        06 Borrow.Pit.3 41.95546 -72.15375    0      0  29.22222 Borrow.Pit.2 41.95571 -72.15413      0        07      Boulder 41.91822 -72.14978 1392     98  43.53333 Borrow.Pit.3 41.95546 -72.15375      0        0  n.avg.depth n.distance radius15001    41.44444  766.38426          32    71.31250 1051.20527          13    57.77778 1051.20527          14    37.70000   33.69099          35    41.44444   33.69099          36    37.70000   41.99576          37    29.22222 4149.07406          0

For obtaining the closest neighbor after calculating distance, you can use sort() with the partial = 2 argument. Depending on the amount of data, this could be much faster than using order as in the previous solution. The package Rfast is likely even faster but I avoid including additional packages here. See this related post for a discussion and benchmarking of various solutions: https://stackoverflow.com/a/53144760/12265198


In Rfast, there is a function called "dista" and computes the Euclidean or the Manhattan distances only (at the moment). It gives the option to compute the k-smallest distances. Alternatively it can return the indexes of the observations with the smallest distances. The cosinus distance is basically almost the same as the Eucledean distance (exluding a constant, 2 I think).