numpy ndarray hashability numpy ndarray hashability numpy numpy

numpy ndarray hashability


I get the same results in Python 2.6.6 and numpy 1.3.0. According to the Python glossary, an object should be hashable if __hash__ is defined (and is not None), and either __eq__ or __cmp__ is defined. ndarray.__eq__ and ndarray.__hash__ are both defined and return something meaningful, so I don't see why hash should fail. After a quick google, I found this post on the python.scientific.devel mailing list, which states that arrays have never been intended to be hashable - so why ndarray.__hash__ is defined, I have no idea. Note that isinstance(nparray, collections.Hashable) returns True.

EDIT: Note that nparray.__hash__() returns the same as id(nparray), so this is just the default implementation. Maybe it was difficult or impossible to remove the implementation of __hash__ in earlier versions of python (the __hash__ = None technique was apparently introduced in 2.6), so they used some kind of C API magic to achieve this in a way that wouldn't propagate to subclasses, and wouldn't stop you from calling ndarray.__hash__ explicitly?

Things are different in Python 3.2.2 and the current numpy 2.0.0 from the repo. The __cmp__ method no longer exists, so hashability now requires __hash__ and __eq__ (see Python 3 glossary). In this version of numpy, ndarray.__hash__ is defined, but it is just None, so cannot be called. hash(nparray) fails andisinstance(nparray, collections.Hashable) returns False as expected. hash(vector) also fails.


This is not a clear answer, but here is some track to follow to understand this behavior.

I refer here to the numpy code of the 1.6.1 release.

According to numpy.ndarray object implementation (look at, numpy/core/src/multiarray/arrayobject.c), hash method is set to NULL.

NPY_NO_EXPORT PyTypeObject PyArray_Type = {#if defined(NPY_PY3K)    PyVarObject_HEAD_INIT(NULL, 0)#else    PyObject_HEAD_INIT(NULL)    0,                                          /* ob_size */#endif    "numpy.ndarray",                            /* tp_name */    sizeof(PyArrayObject),                      /* tp_basicsize */    &array_as_mapping,                          /* tp_as_mapping */    (hashfunc)0,                                /* tp_hash */

This tp_hash property seems to be overridden in numpy/core/src/multiarray/multiarraymodule.c. See DUAL_INHERIT, DUAL_INHERIT2 and initmultiarray function where tp_hash attribute is modified.

Ex: PyArrayDescr_Type.tp_hash = PyArray_DescrHash

According to hashdescr.c, hash is implemented as follow:

* How does this work ? The hash is computed from a list which contains all the* information specific to a type. The hard work is to build the list* (_array_descr_walk). The list is built as follows:*      * If the dtype is builtin (no fields, no subarray), then the list*      contains 6 items which uniquely define one dtype (_array_descr_builtin)*      * If the dtype is a compound array, one walk on each field. For each*      field, we append title, names, offset to the final list used for*      hashing, and then append the list recursively built for each*      corresponding dtype (_array_descr_walk_fields)*      * If the dtype is a subarray, one adds the shape tuple to the list, and*      then append the list recursively built for each corresponding type*      (_array_descr_walk_subarray)