Efficient distance calculation between N points and a reference in numpy/scipy Efficient distance calculation between N points and a reference in numpy/scipy arrays arrays

Efficient distance calculation between N points and a reference in numpy/scipy


I would take a look at scipy.spatial.distance.cdist:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

import numpy as npimport scipya = np.random.normal(size=(10,3))b = np.random.normal(size=(1,3))dist = scipy.spatial.distance.cdist(a,b) # pick the appropriate distance metric 

dist for the default distant metric is equivalent to:

np.sqrt(np.sum((a-b)**2,axis=1))  

although cdist is much more efficient for large arrays (on my machine for your size problem, cdist is faster by a factor of ~35x).


I would use the sklearn implementation of the euclidean distance. The advantage is the usage of the more efficient expression by using Matrix multiplication:

dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y)

A simple script would look like this:

import numpy as npx = np.random.rand(1000, 3)y = np.random.rand(1000, 3)dist = np.sqrt(np.dot(x, x)) - (dot(x, y) + dot(x, y)) + dot(y, y)

The advantage of this approach has been nicely described in the sklearn documentation:http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html#sklearn.metrics.pairwise.euclidean_distances

I am using this approach to crunch large datamatrices (10000, 10000) with some minor modifications like using the np.einsum function.


You can also use the development of the norm (similar to remarkable identities). This is probably the most efficent way to compute the distance of a matrix of points.

Here is a code snippet that I originally used for a k-Nearest-Neighbors implementation, in Octave, but you can easily adapt it to numpy since it only uses matrix multiplications (the equivalent is numpy.dot()):

% Computing the euclidian distance between each known point (Xapp) and unknown points (Xtest)% Note: we use the development of the norm just like a remarkable identity:% ||x1 - x2||^2 = ||x1||^2 + ||x2||^2 - 2*<x1,x2>[napp, d] = size(Xapp);[ntest, d] = size(Xtest);A = sum(Xapp.^2, 2);A = repmat(A, 1, ntest);B = sum(Xtest.^2, 2);B = repmat(B', napp, 1);C = Xapp*Xtest';dist = A+B-2.*C;