Finding index of nearest point in numpy arrays of x and y coordinates
Here is a scipy.spatial.KDTree
example
In [1]: from scipy import spatialIn [2]: import numpy as npIn [3]: A = np.random.random((10,2))*100In [4]: AOut[4]:array([[ 68.83402637, 38.07632221], [ 76.84704074, 24.9395109 ], [ 16.26715795, 98.52763827], [ 70.99411985, 67.31740151], [ 71.72452181, 24.13516764], [ 17.22707611, 20.65425362], [ 43.85122458, 21.50624882], [ 76.71987125, 44.95031274], [ 63.77341073, 78.87417774], [ 8.45828909, 30.18426696]])In [5]: pt = [6, 30] # <-- the point to findIn [6]: A[spatial.KDTree(A).query(pt)[1]] # <-- the nearest point Out[6]: array([ 8.45828909, 30.18426696])#how it works!In [7]: distance,index = spatial.KDTree(A).query(pt)In [8]: distance # <-- The distances to the nearest neighborsOut[8]: 2.4651855048258393In [9]: index # <-- The locations of the neighborsOut[9]: 9#then In [10]: A[index]Out[10]: array([ 8.45828909, 30.18426696])
scipy.spatial
also has a k-d tree implementation: scipy.spatial.KDTree
.
The approach is generally to first use the point data to build up a k-d tree. The computational complexity of that is on the order of N log N, where N is the number of data points. Range queries and nearest neighbour searches can then be done with log N complexity. This is much more efficient than simply cycling through all points (complexity N).
Thus, if you have repeated range or nearest neighbor queries, a k-d tree is highly recommended.
If you can massage your data into the right format, a fast way to go is to use the methods in scipy.spatial.distance
:
http://docs.scipy.org/doc/scipy/reference/spatial.distance.html
In particular pdist
and cdist
provide fast ways to calculate pairwise distances.