Finding index of nearest point in numpy arrays of x and y coordinates Finding index of nearest point in numpy arrays of x and y coordinates numpy numpy

Finding index of nearest point in numpy arrays of x and y coordinates


Here is a scipy.spatial.KDTree example

In [1]: from scipy import spatialIn [2]: import numpy as npIn [3]: A = np.random.random((10,2))*100In [4]: AOut[4]:array([[ 68.83402637,  38.07632221],       [ 76.84704074,  24.9395109 ],       [ 16.26715795,  98.52763827],       [ 70.99411985,  67.31740151],       [ 71.72452181,  24.13516764],       [ 17.22707611,  20.65425362],       [ 43.85122458,  21.50624882],       [ 76.71987125,  44.95031274],       [ 63.77341073,  78.87417774],       [  8.45828909,  30.18426696]])In [5]: pt = [6, 30]  # <-- the point to findIn [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point Out[6]: array([  8.45828909,  30.18426696])#how it works!In [7]: distance,index = spatial.KDTree(A).query(pt)In [8]: distance # <-- The distances to the nearest neighborsOut[8]: 2.4651855048258393In [9]: index # <-- The locations of the neighborsOut[9]: 9#then In [10]: A[index]Out[10]: array([  8.45828909,  30.18426696])


scipy.spatial also has a k-d tree implementation: scipy.spatial.KDTree.

The approach is generally to first use the point data to build up a k-d tree. The computational complexity of that is on the order of N log N, where N is the number of data points. Range queries and nearest neighbour searches can then be done with log N complexity. This is much more efficient than simply cycling through all points (complexity N).

Thus, if you have repeated range or nearest neighbor queries, a k-d tree is highly recommended.


If you can massage your data into the right format, a fast way to go is to use the methods in scipy.spatial.distance:

http://docs.scipy.org/doc/scipy/reference/spatial.distance.html

In particular pdist and cdist provide fast ways to calculate pairwise distances.