Pairwise operations (distance) on two lists in numpy Pairwise operations (distance) on two lists in numpy numpy numpy

Pairwise operations (distance) on two lists in numpy


Here is a quick performance analysis of the four methods presented so far:

import numpyimport scipyfrom itertools import productfrom scipy.spatial.distance import cdistfrom scipy.spatial import cKDTree as KDTreen = 100l1 = numpy.random.randint(0, 100, size=(n,3))l2 = numpy.random.randint(0, 100, size=(n,3))# by @Phillipdef a(l1,l2):    return min(numpy.linalg.norm(l1_element - l2_element) for l1_element,l2_element in product(l1,l2))# by @Kasradef b(l1,l2):    return numpy.min(numpy.apply_along_axis(        numpy.linalg.norm,        2,        l1[:, None, :] - l2[None, :, :]    ))# minedef c(l1,l2):    return numpy.min(scipy.spatial.distance.cdist(l1,l2))# just checking that numpy.min is indeed faster.def c2(l1,l2):    return min(scipy.spatial.distance.cdist(l1,l2).reshape(-1))# by @BrianLarsendef d(l1,l2):    # make KDTrees for both sets of points    t1 = KDTree(l1)    t2 = KDTree(l2)    # we need a distance to not look beyond, if you have real knowledge use it, otherwise guess    maxD = numpy.linalg.norm(l1[0] - l2[0]) # this could be closest but anyhting further is certainly not    # get a sparce matrix of all the distances    ans = t1.sparse_distance_matrix(t2, maxD)    # get the minimum distance and points involved    minD = min(ans.values())    return minDfor x in (a,b,c,c2,d):    print("Timing variant", x.__name__, ':', flush=True)    print(x(l1,l2), flush=True)    %timeit x(l1,l2)    print(flush=True)

For n=100

Timing variant a :2.236067977510 loops, best of 3: 90.3 ms per loopTiming variant b :2.236067977510 loops, best of 3: 151 ms per loopTiming variant c :2.236067977510000 loops, best of 3: 136 µs per loopTiming variant c2 :2.23606797751000 loops, best of 3: 844 µs per loopTiming variant d :2.2360679775100 loops, best of 3: 3.62 ms per loop

For n=1000

Timing variant a :0.01 loops, best of 3: 9.16 s per loopTiming variant b :0.01 loops, best of 3: 14.9 s per loopTiming variant c :0.0100 loops, best of 3: 11 ms per loopTiming variant c2 :0.010 loops, best of 3: 80.3 ms per loopTiming variant d :0.01 loops, best of 3: 933 ms per loop


Using newaxis and broadcasting, l1[:, None, :] - l2[None, :, :] is an array of the pairwise difference vectors. You can reduce this array to an array of norms using apply_along_axis and then take the min:

numpy.min(numpy.apply_along_axis(    numpy.linalg.norm,    2,    l1[:, None, :] - l2[None, :, :]))

Of course, this only works if l1 and l2 are numpy arrays, so if your lists in the question weren't pseudo-code, you'll have to add l1 = numpy.array(l1); l2 = numpy.array(l2).


You can use itertools.product to get the all combinations the use min :

l1 = [[x,y,z],[x,y,z],[x,y,z],[x,y,z],[x,y,z]]l2 = [[x,y,z],[x,y,z],[x,y,z]]from itertools import productmin(numpy.linalg.norm(l1_element - l2_element) for l1_element,l2_element in product(l1,l2))