Efficiently finding the closest coordinate pair from a set in Python Efficiently finding the closest coordinate pair from a set in Python python python

Efficiently finding the closest coordinate pair from a set in Python


Using a k-dimensional tree:

>>> from scipy import spatial>>> airports = [(10,10),(20,20),(30,30),(40,40)]>>> tree = spatial.KDTree(airports)>>> tree.query([(21,21)])(array([ 1.41421356]), array([1]))

Where 1.41421356 is the distance between the queried point and the nearest neighbour and 1 is the index of the neighbour.

See: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query


If your coordinates are unsorted, your search can only be improved slightly assuming it is (latitude,longitude) by filtering on latitude first as for earth

1 degree of latitude on the sphere is 111.2 km or 69 miles

but that would not give a huge speedup.

If you sort the airports by latitude first then you can use a binary search for finding the first airport that could match (airport_lat >= point_lat-tolerance) and then only compare up to the last one that could match (airport_lat <= point_lat+tolerance) - but take care of 0 degrees equaling 360. While you cannot use that library directly, the sources of bisect are a good start for implementing a binary search.

While technically this way the search is still O(n), you have much fewer actual distance calculations (depending on tolerance) and few latitude comparisons. So you will have a huge speedup.


From this SO question:

import numpy as npdef closest_node(node, nodes):    nodes = np.asarray(nodes)    deltas = nodes - node    dist_2 = np.einsum('ij,ij->i', deltas, deltas)    return np.argmin(dist_2)

where node is a tuple with two values (x, y) and nodes is an array of tuples with two values ([(x_1, y_1), (x_2, y_2),])