Checking for NaN presence in a container Checking for NaN presence in a container python-3.x python-3.x

Checking for NaN presence in a container


Question #1: why is NaN found in a container when it's an identical object.

From the documentation:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

This is precisely what I observe with NaN, so everything is fine. Why this rule? I suspect it's because a dict/set wants to honestly report that it contains a certain object if that object is actually in it (even if __eq__() for whatever reason chooses to report that the object is not equal to itself).

Question #2: why is the hash value for NaN the same as for 0?

From the documentation:

Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. hash() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

Note that the requirement is only in one direction; objects that have the same hash do not have to be equal! At first I thought it's a typo, but then I realized that it's not. Hash collisions happen anyway, even with default __hash__() (see an excellent explanation here). The containers handle collisions without any problem. They do, of course, ultimately use the == operator to compare elements, hence they can easily end up with multiple values of NaN, as long as they are not identical! Try this:

>>> nan1 = float('nan')>>> nan2 = float('nan')>>> d = {}>>> d[nan1] = 1>>> d[nan2] = 2>>> d[nan1]1>>> d[nan2]2

So everything works as documented. But... it's very very dangerous! How many people knew that multiple values of NaN could live alongside each other in a dict? How many people would find this easy to debug?..

I would recommend to make NaN an instance of a subclass of float that doesn't support hashing and hence cannot be accidentally added to a set/dict. I'll submit this to python-ideas.

Finally, I found a mistake in the documentation here:

For user-defined classes which do not define __contains__() but do define __iter__(), x in y is true if some value z with x == z is produced while iterating over y. If an exception is raised during the iteration, it is as if in raised that exception.

Lastly, the old-style iteration protocol is tried: if a class defines __getitem__(), x in y is true if and only if there is a non-negative integer index i such that x == y[i], and all lower integer indices do not raise IndexError exception. (If any other exception is raised, it is as if in raised that exception).

You may notice that there is no mention of is here, unlike with built-in containers. I was surprised by this, so I tried:

>>> nan1 = float('nan')>>> nan2 = float('nan')>>> class Cont:...   def __iter__(self):...     yield nan1...>>> c = Cont()>>> nan1 in cTrue>>> nan2 in cFalse

As you can see, the identity is checked first, before == - consistent with the built-in containers. I'll submit a report to fix the docs.


I can't repro you tuple/set cases using float('nan') instead of NaN.

So i assume that it worked only because id(NaN) == id(NaN), i.e. there is no interning for NaN objects:

>>> NaN = float('NaN')>>> id(NaN)34373956456>>> id(float('NaN'))34373956480

And

>>> NaN is NaNTrue>>> NaN is float('NaN')False

I believe tuple/set lookups has some optimization related to comparison of the same objects.

Answering your question - it seam to be unsafe to relay on in operator while checking for presence of NaN. I'd recommend to use None, if possible.


Just a comment. __eq__ has nothing to do with is statement, and during lookups comparison of objects' ids seem to happen prior to any value comparisons:

>>> class A(object):...     def __eq__(*args):...             print '__eq__'...>>> A() == A()__eq__          # as expected>>> A() is A()False           # `is` checks only ids>>> A() in [A()]__eq__          # as expectedFalse>>> a = A()>>> a in [a]True            # surprise!