What hash algorithm does Python's dictionary mapping use? What hash algorithm does Python's dictionary mapping use? python python

What hash algorithm does Python's dictionary mapping use?


The hash that it uses depends on the object being used as a key -- each class can define its own __hash__() method, and the value that it returns for a particular instance is what is used for the dictionary.

Python itself provides the hash implementation for str and tuple types. A quick look at the source should reveal the exact algorithm for those.

The hash of a tuple is based on the hashes of its contents. The algorithm is essentially this (simplified slightly):

def hash(tuple):    mult = 1000003    x = 0x345678    for index, item in enumerate(tuple):        x = ((x ^ hash(item)) * mult) & (1<<32)        mult += (82520 + (len(tuple)-index)*2)    return x + 97531

For strings, the interpreter also iterates over every character, combining them with this (again, slightly simplified) algorithm:

def hash(string):    x = string[0] << 7    for chr in string[1:]:        x = ((1000003 * x) ^ chr) & (1<<32)    return x

A bigger issue to worry about is avoiding hash collisions. Colliding hash keys will cause a linear search as the dictionary tries to find a place to store the new object (this is now being recognized as a security issue, and tha behavior may be changing in upcoming python versions)