Constructing a Python set from a Numpy matrix Constructing a Python set from a Numpy matrix arrays arrays

Constructing a Python set from a Numpy matrix


If you want a set of the elements, here is another, probably faster way:

y = set(x.flatten())

PS: after performing comparisons between x.flat, x.flatten(), and x.ravel() on a 10x100 array, I found out that they all perform at about the same speed. For a 3x3 array, the fastest version is the iterator version:

y = set(x.flat)

which I would recommend because it is the less memory expensive version (it scales up well with the size of the array).

PPS: There is also a NumPy function that does something similar:

y = numpy.unique(x)

This does produce a NumPy array with the same element as set(x.flat), but as a NumPy array. This is very fast (almost 10 times faster), but if you need a set, then doing set(numpy.unique(x)) is a bit slower than the other procedures (building a set comes with a large overhead).


The immutable counterpart to an array is the tuple, hence, try convert the array of arrays into an array of tuples:

>> from numpy import *>> x = array([[3,2,3],[4,4,4]])>> x_hashable = map(tuple, x)>> y = set(x_hashable)set([(3, 2, 3), (4, 4, 4)])


The above answers work if you want to create a set out of the elements contained in an ndarray, but if you want to create a set of ndarray objects – or use ndarray objects as keys in a dictionary – then you'll have to provide a hashable wrapper for them. See the code below for a simple example:

from hashlib import sha1from numpy import all, array, uint8class hashable(object):    r'''Hashable wrapper for ndarray objects.        Instances of ndarray are not hashable, meaning they cannot be added to        sets, nor used as keys in dictionaries. This is by design - ndarray        objects are mutable, and therefore cannot reliably implement the        __hash__() method.        The hashable class allows a way around this limitation. It implements        the required methods for hashable objects in terms of an encapsulated        ndarray object. This can be either a copied instance (which is safer)        or the original object (which requires the user to be careful enough        not to modify it).    '''    def __init__(self, wrapped, tight=False):        r'''Creates a new hashable object encapsulating an ndarray.            wrapped                The wrapped ndarray.            tight                Optional. If True, a copy of the input ndaray is created.                Defaults to False.        '''        self.__tight = tight        self.__wrapped = array(wrapped) if tight else wrapped        self.__hash = int(sha1(wrapped.view(uint8)).hexdigest(), 16)    def __eq__(self, other):        return all(self.__wrapped == other.__wrapped)    def __hash__(self):        return self.__hash    def unwrap(self):        r'''Returns the encapsulated ndarray.            If the wrapper is "tight", a copy of the encapsulated ndarray is            returned. Otherwise, the encapsulated ndarray itself is returned.        '''        if self.__tight:            return array(self.__wrapped)        return self.__wrapped

Using the wrapper class is simple enough:

>>> from numpy import arange>>> a = arange(0, 1024)>>> d = {}>>> d[a] = 'foo'Traceback (most recent call last):  File "<input>", line 1, in <module>TypeError: unhashable type: 'numpy.ndarray'>>> b = hashable(a)>>> d[b] = 'bar'>>> d[b]'bar'