Find unique rows in numpy.array Find unique rows in numpy.array python python

Find unique rows in numpy.array


As of NumPy 1.13, one can simply choose the axis for selection of unique values in any N-dim array. To get unique rows, one can do:

unique_rows = np.unique(original_array, axis=0)


Yet another possible solution

np.vstack({tuple(row) for row in a})


Another option to the use of structured arrays is using a view of a void type that joins the whole row into a single item:

a = np.array([[1, 1, 1, 0, 0, 0],              [0, 1, 1, 1, 0, 0],              [0, 1, 1, 1, 0, 0],              [1, 1, 1, 0, 0, 0],              [1, 1, 1, 1, 1, 0]])b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))_, idx = np.unique(b, return_index=True)unique_a = a[idx]>>> unique_aarray([[0, 1, 1, 1, 0, 0],       [1, 1, 1, 0, 0, 0],       [1, 1, 1, 1, 1, 0]])

EDITAdded np.ascontiguousarray following @seberg's recommendation. This will slow the method down if the array is not already contiguous.

EDITThe above can be slightly sped up, perhaps at the cost of clarity, by doing:

unique_a = np.unique(b).view(a.dtype).reshape(-1, a.shape[1])

Also, at least on my system, performance wise it is on par, or even better, than the lexsort method:

a = np.random.randint(2, size=(10000, 6))%timeit np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, a.shape[1])100 loops, best of 3: 3.17 ms per loop%timeit ind = np.lexsort(a.T); a[np.concatenate(([True],np.any(a[ind[1:]]!=a[ind[:-1]],axis=1)))]100 loops, best of 3: 5.93 ms per loopa = np.random.randint(2, size=(10000, 100))%timeit np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, a.shape[1])10 loops, best of 3: 29.9 ms per loop%timeit ind = np.lexsort(a.T); a[np.concatenate(([True],np.any(a[ind[1:]]!=a[ind[:-1]],axis=1)))]10 loops, best of 3: 116 ms per loop