How to turn Numpy array to set efficiently? How to turn Numpy array to set efficiently? numpy numpy

How to turn Numpy array to set efficiently?


First flatten your ndarray to obtain a single dimensional array, then apply set() on it:

set(x.flatten())

Edit : since it seems you just want an array of set, not a set of the whole array, then you can do value = [set(v) for v in x] to obtain a list of sets.


The current state of your question (can change any time): how can I efficiently remove unique elements from a large array of large arrays?

import numpy as nprng = np.random.default_rng()arr = rng.random((3000, 30000))out1 = list(map(np.unique, arr))#orout2 = [np.unique(subarr) for subarr in arr]

Runtimes in an IPython shell:

>>> %timeit list(map(np.unique, arr))5.39 s ± 37.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit [np.unique(subarr) for subarr in arr]5.42 s ± 58.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Update: as @hpaulj pointed out in his comment, my dummy example is biased since floating-point random numbers will almost certainly be unique. So here's a more life-like example with integer numbers:

>>> arr = rng.integers(low=1, high=15000, size=(3000, 30000))>>> %timeit list(map(np.unique, arr))4.98 s ± 83.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)>>> %timeit [np.unique(subarr) for subarr in arr]4.95 s ± 51.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In this case the elements of the output list have varying lengths, since there are actual duplicates to remove.


A couple of earlier 'row-wise' unique questions:

vectorize numpy unique for subarrays

Numpy: Row Wise Unique elements

Count unique elements row wise in an ndarray

In a couple of these the count is more interesting than the actual unique values.

If the number of unique values per row differs, then the result cannot be a (2d) array. That's a pretty good indication that the problem cannot be fully vectorized. You need some sort of iteration over the rows.