Greatest distance between set of longitude/latitude points

r algorithm geospatial latitude-longitude cran

Theorem #1: The ordering of any two great circle distances along the surface of the earth is the same as the ordering as the straight line distance between the points where you tunnel through the earth.

Hence turn your lat-long into x,y,z based either on a spherical earth of arbitrary radius or an ellipsoid of given shape parameters. That's a couple of sines/cosines per point (not per pair of points).

Now you have a standard 3-d problem that doesn't rely on computing Haversine distances. The distance between points is just Euclidean (Pythagoras in 3d). Needs a square-root and some squares, and you can leave out the square root if you only care about comparisons.

There may be fancy spatial tree data structures to help with this. Or algorithms such as http://www.tcs.fudan.edu.cn/rudolf/Courses/Algorithms/Alg_ss_07w/Webprojects/Qinbo_diameter/2d_alg.htm (click 'Next' for 3d methods). Or C++ code here: http://valis.cs.uiuc.edu/~sariel/papers/00/diameter/diam_prog.html

Once you've found your maximum distance pair, you can use the Haversine formula to get the distance along the surface for that pair.

r algorithm geospatial latitude-longitude cran

I think that the following could be a useful approximation, which scales linearly instead of quadratically with the number of points, and is quite easy to implement:

calculate the center of mass M of the points
find the point P₀ that has the maximum distance to M
find the point P₁ that has the maximum distance to P₀
approximate the maximum diameter with the distance between P₀ and P₁

This can be generalized by repeating step 3 N times,and taking the distance between P_N-1 and P_N

Step 1 can be carried out efficiently approximating M as the average of longitudes and latitudes, which is OK when distances are "small" and the poles are sufficiently far away. The other steps could be carried out using the exact distance formula, but they are much faster if the points' coordinates can be approximated as lying on a plane. Once the "distant pair" (hopefully the pair with the maximum distance) has been found, its distance can be re-calculated with the exact formula.

An example of approximation could be the following: if φ(M) and λ(M) are latitude and longitude of the center of mass calculated as Σφ(P)/n and Σλ(P)/n,

x(P) = (λ(P) - λ(M) + C) cos(φ(P))
y(P) = φ(P) - φ(M) [ this is only for clarity, it can also simply be y(P) = φ(P) ]

where C is usually 0, but can be ± 360° if the set of points crosses the λ=±180° line. To find the maximum distance you simply have to find

max((x(P_N) - x(P_N-1))² + (y(P_N) - y(P_N-1))²)

(you don't need the square root because it is monotonic)

The same coordinate transformation could be used to repeat step 1 (in the new coordinate system) in order to have a better starting point. I suspect that if some conditions are met, the above steps (without repeating step 3) always lead to the "true distant pair" (my terminology). If I only knew which conditions...

EDIT:

I hate building on others' solutions, but someone will have to.

Still keeping the above 4 steps, with the optional (but probably beneficial, depending on the typical distribution of points) repetition of step 3,and following the solution of Spacedman,doing calculations in 3D overcomes the limitations of closeness and distance from poles:

x(P) = sin(φ(P))
y(P) = cos(φ(P)) sin(λ(P))
z(P) = cos(φ(P)) cos(λ(P))

(the only approximation is that this holds only for a perfect sphere)

The center of mass is given by x(M) = Σx(P)/n, etc.,and the maximum one has to look for is

max((x(P_N) - x(P_N-1))² + (y(P_N) - y(P_N-1))² + (z(P_N) - z(P_N-1))²)

So: you first transform spherical to cartesian coordinates, then start from the center of mass, to find, in at least two steps (steps 2 and 3), the farthest point from the preceding point. You could repeat step 3 as long as the distance increases, perhaps with a maximum number of repetitions, but this won't take you away from a local maximum. Starting from the center of mass is not of much help, either, if the points are spread all over the Earth.

EDIT 2:

I learned enough R to write down the core of the algorithm (nice language for data analysis!)

For the plane approximation, ignoring the problem around the λ=±180° line:

# input: lng, lat (vectors)rad = pi / 180;x = (lng - mean(lng)) * cos(lat * rad)y = (lat - mean(lat))i = which.max((x - mean(x))^2 + (y       )^2)j = which.max((x - x[i]   )^2 + (y - y[i])^2)# output: i, j (indices)

On my PC it takes less than a second to find the indices i and j for 1000000 points.
The following 3D version is a bit slower, but works for any distribution of points (and does not need to be amended when the λ=±180° line is crossed):

# input: lng, latrad = pi / 180x = sin(lat * rad)f = cos(lat * rad)y = sin(lng * rad) * fz = cos(lng * rad) * fi = which.max((x - mean(x))^2 + (y - mean(y))^2 + (z - mean(z))^2)j = which.max((x - x[i]   )^2 + (y - y[i]   )^2 + (z - z[i]   )^2)k = which.max((x - x[j]   )^2 + (y - y[j]   )^2 + (z - z[j]   )^2) # optional# output: j, k (or i, j)

The calculation of k can be left out (i.e., the result could be given by i and j), depending on the data and on the requirements. On the other hand, my experiments have shown that calculating a further index is useless.

It should be remembered that, in any case, the distance between the resulting points is an estimate which is a lower bound of the "diameter" of the set, although it very often will be the diameter itself (how often depends on the data.)

EDIT 3:

Unfortunately the relative error of the plane approximation can, in extreme cases, be as much as 1-1/√3 ≅ 42.3%, which may be unacceptable, even if very rare. The algorithm can be modified in order to have an upper bound of approximately 20%, which I have derived by compass and straight-edge (the analytic solution is cumbersome). The modified algorithm finds a pair of points whith a locally maximal distance, then repeats the same steps, but this time starting from the midpoint of the first pair, possibly finding a different pair:

# input: lng, latrad = pi / 180x = (lng - mean(lng)) * cos(lat * rad)y = (lat - mean(lat))i.n_1 = 1 # n_1: n-1x.n_1 = mean(x)y.n_1 = 0 # = mean(y)s.n_1 = 0 # s: square of distancerepeat {   s = (x - x.n_1)^2 + (y - y.n_1)^2   i.n = which.max(s)   x.n = x[i.n]   y.n = y[i.n]   s.n = s[i.n]   if (s.n <= s.n_1) break   i.n_1 = i.n   x.n_1 = x.n   y.n_1 = y.n   s.n_1 = s.n}i.m_1 = 1x.m_1 = (x.n + x.n_1) / 2y.m_1 = (y.n + y.n_1) / 2s.m_1 = 0m_ok  = TRUErepeat {   s = (x - x.m_1)^2 + (y - y.m_1)^2   i.m = which.max(s)   if (i.m == i.n || i.m == i.n_1) { m_ok = FALSE; break }   x.m = x[i.m]   y.m = y[i.m]   s.m = s[i.m]   if (s.m <= s.m_1) break   i.m_1 = i.m   x.m_1 = x.m   y.m_1 = y.m   s.m_1 = s.m}if (m_ok && s.m > s.n) {   i = i.m   j = i.m_1} else {   i = i.n   j = i.n_1}# output: i, j

The 3D algorithm can be modified in a similar way. It is possible (both in the 2D and in the 3D case) to start over once again from the midpoint of the second pair of points (if found). The upper bound in this case is "left as an exercise for the reader" :-).

Comparison of the modified algorithm with the (too) simple algorithm has shown, for normal and for square uniform distributions, a near doubling of processing time, and a reduction of the average error from .6% to .03% (order of magnitude). A further restart from the midpoint results in an a just slightly better average error, but almost equal maximum error.

EDIT 4:

I have to study this article yet, but it looks like the 20% I found with compass and straight-edge is in fact 1-1/√(5-2√3) ≅ 19.3%

r algorithm geospatial latitude-longitude cran

Here's a naive example that doesn't scale well (as you say), as you say but might help with building a solution in R.

## lonlat pointsn <- 100d <- cbind(runif(n, -180, 180), runif(n, -90, 90))library(sp)## distances on WGS84 ellipsoidx <- spDists(d, longlat = TRUE)## row, then column index of furthest pointsind <- c(row(x)[which.max(x)], col(x)[which.max(x)])## mapslibrary(maptools)data(wrld_simpl)plot(as(wrld_simpl, "SpatialLines"), col = "grey")points(d, pch = 16, cex = 0.5)## draw the points and a line between  on the pagepoints(d[ind, ], pch = 16)lines(d[ind, ], lwd = 2)## for extra credit, draw the great circle on which the furthest points lielibrary(geosphere)lines(greatCircle(d[ind[1], ], d[ind[2], ]), col = "firebrick")

Find furthest distance on WGS84 ellipsoid between sample points

The geosphere package provides more options for distance calculation if that's needed. See ?spDists in sp for the details used here.

CodeHunter

Greatest distance between set of longitude/latitude points

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last