Possible to add numpy arrays to python sets? Possible to add numpy arrays to python sets? numpy numpy

Possible to add numpy arrays to python sets?


Create some data first:

import numpy as npnp.random.seed(1)list_of_np_1D = np.random.randint(0, 5, size=(500, 6))np_2D = np.random.randint(0, 5, size=(20, 6))

run your code:

%%timefill_set = set()for i in list_of_np_1D:    vecs = i + np_2D    for v in vecs:        tup = tuple(v)        fill_set.add(tup)res1 = np.array(list(fill_set))

output:

CPU times: user 161 ms, sys: 2 ms, total: 163 msWall time: 167 ms

Here is a speedup version, it use broadcast, .view() method to convert dtype to string, after calling set() convert the string back to array:

%%timer = list_of_np_1D[:, None, :] + np_2D[None, :, :]stype = "S%d" % (r.itemsize * np_2D.shape[1])fill_set2 = set(r.ravel().view(stype).tolist())res2 = np.zeros(len(fill_set2), dtype=stype)res2[:] = list(fill_set2)res2 = res2.view(r.dtype).reshape(-1, np_2D.shape[1])

output:

CPU times: user 13 ms, sys: 1 ms, total: 14 msWall time: 14.6 ms

To check the result:

np.all(res1[np.lexsort(res1.T), :] == res2[np.lexsort(res2.T), :])

You can also use lexsort() to remove duplicated data:

%%timer = list_of_np_1D[:, None, :] + np_2D[None, :, :]r = r.reshape(-1, r.shape[-1])r = r[np.lexsort(r.T)]idx = np.where(np.all(np.diff(r, axis=0) == 0, axis=1))[0] + 1res3 = np.delete(r, idx, axis=0)

output:

CPU times: user 13 ms, sys: 3 ms, total: 16 msWall time: 16.1 ms

To check the result:

np.all(res1[np.lexsort(res1.T), :] == res3)