How can the Euclidean distance be calculated with NumPy? How can the Euclidean distance be calculated with NumPy? python python

# How can the Euclidean distance be calculated with NumPy?

``dist = numpy.linalg.norm(a-b)``

You can find the theory behind this in Introduction to Data Mining

This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in `numpy.linalg.norm` is 2. There's a function for that in SciPy. It's called Euclidean.

Example:

``from scipy.spatial import distancea = (1, 2, 3)b = (4, 5, 6)dst = distance.euclidean(a, b)``

For anyone interested in computing multiple distances at once, I've done a little comparison using perfplot (a small project of mine).

The first advice is to organize your data such that the arrays have dimension `(3, n)` (and are C-contiguous obviously). If adding happens in the contiguous first dimension, things are faster, and it doesn't matter too much if you use `sqrt-sum` with `axis=0`, `linalg.norm` with `axis=0`, or

``a_min_b = a - bnumpy.sqrt(numpy.einsum('ij,ij->j', a_min_b, a_min_b))``

which is, by a slight margin, the fastest variant. (That actually holds true for just one row as well.)

The variants where you sum up over the second axis, `axis=1`, are all substantially slower. Code to reproduce the plot:

``import numpyimport perfplotfrom scipy.spatial import distancedef linalg_norm(data):    a, b = data    return numpy.linalg.norm(a - b, axis=1)def linalg_norm_T(data):    a, b = data    return numpy.linalg.norm(a - b, axis=0)def sqrt_sum(data):    a, b = data    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=1))def sqrt_sum_T(data):    a, b = data    return numpy.sqrt(numpy.sum((a - b) ** 2, axis=0))def scipy_distance(data):    a, b = data    return list(map(distance.euclidean, a, b))def sqrt_einsum(data):    a, b = data    a_min_b = a - b    return numpy.sqrt(numpy.einsum("ij,ij->i", a_min_b, a_min_b))def sqrt_einsum_T(data):    a, b = data    a_min_b = a - b    return numpy.sqrt(numpy.einsum("ij,ij->j", a_min_b, a_min_b))def setup(n):    a = numpy.random.rand(n, 3)    b = numpy.random.rand(n, 3)    out0 = numpy.array([a, b])    out1 = numpy.array([a.T, b.T])    return out0, out1perfplot.save(    "norm.png",    setup=setup,    n_range=[2 ** k for k in range(22)],    kernels=[        linalg_norm,        linalg_norm_T,        scipy_distance,        sqrt_sum,        sqrt_sum_T,        sqrt_einsum,        sqrt_einsum_T,    ],    xlabel="len(x), len(y)",)`` 