Find unique elements of floating point array in numpy (with comparison using a delta value) Find unique elements of floating point array in numpy (with comparison using a delta value) numpy numpy

Find unique elements of floating point array in numpy (with comparison using a delta value)


Another possibility is to just round to the nearest desirable tolerance:

np.unique(a.round(decimals=4))

where a is your original array.

Edit: Just to note that my solution and @unutbu's are nearly identical speed-wise (mine is maybe 5% faster) according to my timings, so either is a good solution.

Edit #2: This is meant to address Paul's concern. It is definitely slower and there may be some optimizations one can make, but I'm posting it as-is to demonstrate the stratgey:

def eclose(a,b,rtol=1.0000000000000001e-05, atol=1e-08):    return np.abs(a - b) <= (atol + rtol * np.abs(b))x = np.array([6.4,6.500000001, 6.5,6.51])y = x.flat.copy()y.sort()ci = 0U = np.empty((0,),dtype=y.dtype)while ci < y.size:    ii = eclose(y[ci],y)    mi = np.max(ii.nonzero())    U = np.concatenate((U,[y[mi]]))     ci = mi + 1print U

This should be decently fast if there are many repeated values within the precision range, but if many of the values are unique, then this is going to be slow. Also, it may be better to set U up as a list and append through the while loop, but that falls under 'further optimization'.


Doesn't floor and round both fail the OP's requirement in some cases?

np.floor([5.99999999, 6.0]) # array([ 5.,  6.])np.round([6.50000001, 6.5], 0) #array([ 7.,  6.])

The way I would do it is (and this may not be optimal (and is surely slower than other answers)) something like this:

import numpy as npTOL = 1.0e-3a = np.random.random((10,10))i = np.argsort(a.flat)d = np.append(True, np.diff(a.flat[i]))result = a.flat[i[d>TOL]]

Of course this method will exclude all but the largest member of a run of values that come within the tolerance of any other value, which means you may not find any unique values in an array if all values are significantly close even though the max-min is larger than the tolerance.

Here is essentially the same algorithm, but easier to understand and should be faster as it avoids an indexing step:

a = np.random.random((10,))b = a.copy()b.sort()d = np.append(True, np.diff(b))result = b[d>TOL]

The OP may also want to look into scipy.cluster (for a fancy version of this method) or numpy.digitize (for a fancy version of the other two methods)


I have just noticed that the accepted answer doesn't work. E.g. this case:

a = 1-np.random.random(20)*0.05<20 uniformly chosen values between 0.95 and 1.0>np.sort(a)>>>> array([ 0.9514548 ,  0.95172218,  0.95454535,  0.95482343,  0.95599525,             0.95997008,  0.96385762,  0.96679186,  0.96873524,  0.97016127,             0.97377579,  0.98407259,  0.98490461,  0.98964753,  0.9896733 ,             0.99199411,  0.99261766,  0.99317258,  0.99420183,  0.99730928])TOL = 0.01

Results into:

a.flat[i[d>TOL]]>>>> array([], dtype=float64)

Simply because none of the values of the sorted input array are sufficiently spaced to be at least “TOL“ appart, while the correct result should be:

>>>> array([ 0.9514548,  0.96385762,  0.97016127,  0.98407259,             0.99199411])

(although it depends how you decide which value to take within the “TOL“)

You should use the fact that integers don't suffer from such machine precision effect:

np.unique(np.floor(a/TOL).astype(int))*TOL>>>> array([ 0.95,  0.96,  0.97,  0.98,  0.99])

which performs 5 times faster than the proposed solution (according to %timeit).

Note that “.astype(int)“ is optional, although removing it deteriorates the performance by a factor of 1.5, given that extracting uniques out of an array of int is much faster.

You might want to add half the “TOL“ to the results of uniques, to compensate for the flooring effect:

(np.unique(np.floor(a/TOL).astype(int))+0.5)*TOL>>>> array([ 0.955,  0.965,  0.975,  0.985,  0.995])