How can the Euclidean distance be calculated with NumPy? How can the Euclidean distance be calculated with NumPy? numpy numpy

How can the Euclidean distance be calculated with NumPy?


Use numpy.linalg.norm:

dist = numpy.linalg.norm(a-b)

You can find the theory behind this in Introduction to Data Mining

This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.

enter image description here


There's a function for that in SciPy. It's called Euclidean.

Example:

from scipy.spatial import distancea = (1, 2, 3)b = (4, 5, 6)dst = distance.euclidean(a, b)


For anyone interested in computing multiple distances at once, I've done a little comparison using perfplot (a small project of mine).

The first advice is to organize your data such that the arrays have dimension (3, n) (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn't matter too much if you use sqrt-sum with axis=0, linalg.norm with axis=0, or

a_min_b = a - bnumpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))

which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)

The variants where you sum up over the second axis, axis=1, are all substantially slower.

enter image description here


Code to reproduce the plot:

import numpyimport perfplotfrom scipy.spatial import distancedef linalg_norm(data):    a, b = data[0]    return numpy.linalg.norm(a - b, axis=1)def linalg_norm_T(data):    a, b = data[1]    return numpy.linalg.norm(a - b, axis=0)def sqrt_sum(data):    a, b = data[0]    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))def sqrt_sum_T(data):    a, b = data[1]    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))def scipy_distance(data):    a, b = data[0]    return list(map(distance.euclidean, a, b))def sqrt_einsum(data):    a, b = data[0]    a_min_b = a - b    return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))def sqrt_einsum_T(data):    a, b = data[1]    a_min_b = a - b    return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))def setup(n):    a = numpy.random.rand(n, 3)    b = numpy.random.rand(n, 3)    out0 = numpy.array([a, b])    out1 = numpy.array([a.T, b.T])    return out0, out1perfplot.save(    "norm.png",    setup=setup,    n_range=[2 ** k for k in range(22)],    kernels=[        linalg_norm,        linalg_norm_T,        scipy_distance,        sqrt_sum,        sqrt_sum_T,        sqrt_einsum,        sqrt_einsum_T,    ],    xlabel="len(x), len(y)",)