Interpolate unstructured X,Y,Z data on best grid based on nearest neighbour distance for each points
The problem you want to solve is called the "all-nearest-neighbors problem". See this article for example: http://link.springer.com/article/10.1007/BF02187718
I believe solutions to this are O(N log N), so on the same order as KDTree.query, but in practice much, much faster than a bunch of separate queries. I'm sorry, I don't know of a python implementation of this.
I suggest you to go with KDTree.query
.
You are searching of a carachteristic distance to scale your binning: I suggest you to take only a random subset of your points, and to use the Manhattan distance, becasue KDTree.query
is very slow (and yet it is a n*log(n) complexity).
Here is my code:
# CreateTreetree=scipy.spatial.KDTree(numpy.array(points)) # better give it a copy?# Create random subsample of pointsn_repr=1000shuffled_points=numpy.array(points)numpy.random.shuffle(shuffled_points)shuffled_points=shuffled_points[:n_repr]# Query the tree(dists,points)=tree.query(shuffled_points,k=2,p=1)# Get _extimate_ of average distance:avg_dists=numpy.average(dists)print('average distance Manhattan with nearest neighbour is:',avg_dists)
I suggest you to use the Manhattan distance ( https://en.wikipedia.org/wiki/Taxicab_geometry ) because it is was faster to compute than the euclidean distance. And since you need only an estimator of the average distance it should be sufficient.