Using Numpy to find the average distance in a set of points Using Numpy to find the average distance in a set of points numpy numpy

Using Numpy to find the average distance in a set of points


Well, I don't think that there is a super fast way to do this, but this should do it:

tot = 0.for i in xrange(data.shape[0]-1):    tot += ((((data[i+1:]-data[i])**2).sum(1))**.5).sum()avg = tot/((data.shape[0]-1)*(data.shape[0])/2.)


Now that you've stated your goal of finding the outliers, you are probably better off computing the sample mean and, with that, the sample variance, since both those operations will give you an O(nd) operation. With that, you should be able to find outliers (e.g. excluding points further from the mean than some fraction of the std. dev.), and that filtering process should be possible to perform in O(nd) time for a total of O(nd).

You might be interested in a refresher on Chebyshev's inequality.