Efficiently Calculating a Euclidean Distance Matrix Using Numpy

python numpy matrix performance euclidean-distance

You can take advantage of the complex type :

# build a complex array of your cellsz = np.array([complex(c.m_x, c.m_y) for c in cells])

First solution

# mesh this array so that you will have all combinationsm, n = np.meshgrid(z, z)# get the distance via the normout = abs(m-n)

Second solution

Meshing is the main idea. But numpy is clever, so you don't have to generate m & n. Just compute the difference using a transposed version of z. The mesh is done automatically :

out = abs(z[..., np.newaxis] - z)

Third solution

And if z is directly set as a 2-dimensional array, you can use z.T instead of the weird z[..., np.newaxis]. So finally, your code will look like this :

z = np.array([[complex(c.m_x, c.m_y) for c in cells]]) # notice the [[ ... ]]out = abs(z.T-z)

Example

>>> z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])>>> abs(z.T-z)array([[ 0.        ,  2.23606798,  4.12310563],       [ 2.23606798,  0.        ,  4.24264069],       [ 4.12310563,  4.24264069,  0.        ]])

As a complement, you may want to remove duplicates afterwards, taking the upper triangle :

>>> np.triu(out)array([[ 0.        ,  2.23606798,  4.12310563],       [ 0.        ,  0.        ,  4.24264069],       [ 0.        ,  0.        ,  0.        ]])

Some benchmarks

>>> timeit.timeit('abs(z.T-z)', setup='import numpy as np;z = np.array([[0.+0.j, 2.+1.j, -1.+4.j]])')4.645645342274779>>> timeit.timeit('abs(z[..., np.newaxis] - z)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')5.049334864854522>>> timeit.timeit('m, n = np.meshgrid(z, z); abs(m-n)', setup='import numpy as np;z = np.array([0.+0.j, 2.+1.j, -1.+4.j])')22.489568296184686

python numpy matrix performance euclidean-distance

If you don't need the full distance matrix, you will be better off using kd-tree. Consider scipy.spatial.cKDTree or sklearn.neighbors.KDTree. This is because a kd-tree kan find k-nearnest neighbors in O(n log n) time, and therefore you avoid the O(n**2) complexity of computing all n by n distances.

python numpy matrix performance euclidean-distance

Jake Vanderplas gives this example using broadcasting in Python Data Science Handbook, which is very similar to what @shx2 proposed.

import numpy as nprand = random.RandomState(42)X = rand.rand(3, 2)  dist_sq = np.sum((X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2, axis = -1)dist_sqarray([[0.        , 0.18543317, 0.81602495],       [0.18543317, 0.        , 0.22819282],       [0.81602495, 0.22819282, 0.        ]])

CodeHunter

Efficiently Calculating a Euclidean Distance Matrix Using Numpy

First solution

Second solution

Third solution

Example

Some benchmarks

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last