Calculating all distances between one point and a group of points efficiently in R Calculating all distances between one point and a group of points efficiently in R r r

Calculating all distances between one point and a group of points efficiently in R


Rather than iterating across data points, you can just condense that to a matrix operation, meaning you only have to iterate across K.

# Generate some fake data.n <- 3823K <- 10d <- 64x <- matrix(rnorm(n * d), ncol = n)centers <- matrix(rnorm(K * d), ncol = K)system.time(  dists <- apply(centers, 2, function(center) {    colSums((x - center)^2)}))

Runs in:

utilisateur     système      écoulé       0.100       0.008       0.108 

on my laptop.


rdist() is a R function from {fields} package which is able to calculate distances between two sets of points in matrix format quickly.

https://www.image.ucar.edu/~nychka/Fields/Help/rdist.html

Usage :

library(fields)#generating fake datan <- 5m <- 10d <- 3x <- matrix(rnorm(n * d), ncol = d)y <- matrix(rnorm(m * d), ncol = d)rdist(x, y)          [,1]     [,2]      [,3]     [,4]     [,5] [1,] 1.512383 3.053084 3.1420322 4.942360 3.345619 [2,] 3.531150 4.593120 1.9895867 4.212358 2.868283 [3,] 1.925701 2.217248 2.4232672 4.529040 2.243467 [4,] 2.751179 2.260113 2.2469334 3.674180 1.701388 [5,] 3.303224 3.888610 0.5091929 4.563767 1.661411 [6,] 3.188290 3.304657 3.6668867 3.599771 3.453358 [7,] 2.891969 2.823296 1.6926825 4.845681 1.544732 [8,] 2.987394 1.553104 2.8849988 4.683407 2.000689 [9,] 3.199353 2.822421 1.5221291 4.414465 1.078257[10,] 2.492993 2.994359 3.3573190 6.498129 3.337441


You may want to have a look into the apply functions.

For instance, this code

for (j in 1:K)    {    d[j] <- sqrt(sum((centers[j,] - data[i,])^2))    }

Can easily be substituted by something like

dt <- data[i,]d <- apply(centers, 1, function(x){ sqrt(sum(x-dt)^2)})

You can definitely optimise it more but you get the point I hope