Possible to add numpy arrays to python sets?
Create some data first:
import numpy as npnp.random.seed(1)list_of_np_1D = np.random.randint(0, 5, size=(500, 6))np_2D = np.random.randint(0, 5, size=(20, 6))
run your code:
%%timefill_set = set()for i in list_of_np_1D: vecs = i + np_2D for v in vecs: tup = tuple(v) fill_set.add(tup)res1 = np.array(list(fill_set))
output:
CPU times: user 161 ms, sys: 2 ms, total: 163 msWall time: 167 ms
Here is a speedup version, it use broadcast, .view()
method to convert dtype to string, after calling set()
convert the string back to array:
%%timer = list_of_np_1D[:, None, :] + np_2D[None, :, :]stype = "S%d" % (r.itemsize * np_2D.shape[1])fill_set2 = set(r.ravel().view(stype).tolist())res2 = np.zeros(len(fill_set2), dtype=stype)res2[:] = list(fill_set2)res2 = res2.view(r.dtype).reshape(-1, np_2D.shape[1])
output:
CPU times: user 13 ms, sys: 1 ms, total: 14 msWall time: 14.6 ms
To check the result:
np.all(res1[np.lexsort(res1.T), :] == res2[np.lexsort(res2.T), :])
You can also use lexsort()
to remove duplicated data:
%%timer = list_of_np_1D[:, None, :] + np_2D[None, :, :]r = r.reshape(-1, r.shape[-1])r = r[np.lexsort(r.T)]idx = np.where(np.all(np.diff(r, axis=0) == 0, axis=1))[0] + 1res3 = np.delete(r, idx, axis=0)
output:
CPU times: user 13 ms, sys: 3 ms, total: 16 msWall time: 16.1 ms
To check the result:
np.all(res1[np.lexsort(res1.T), :] == res3)