hash function for string hash function for string c c

hash function for string


I've had nice results with djb2 by Dan Bernstein.

unsigned longhash(unsigned char *str){    unsigned long hash = 5381;    int c;    while (c = *str++)        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */    return hash;}


First, you generally do not want to use a cryptographic hash for a hash table. An algorithm that's very fast by cryptographic standards is still excruciatingly slow by hash table standards.

Second, you want to ensure that every bit of the input can/will affect the result. One easy way to do that is to rotate the current result by some number of bits, then XOR the current hash code with the current byte. Repeat until you reach the end of the string. Note that you generally do not want the rotation to be an even multiple of the byte size either.

For example, assuming the common case of 8 bit bytes, you might rotate by 5 bits:

int hash(char const *input) {     int result = 0x55555555;    while (*input) {         result ^= *input++;        result = rol(result, 5);    }}

Edit: Also note that 10000 slots is rarely a good choice for a hash table size. You usually want one of two things: you either want a prime number as the size (required to ensure correctness with some types of hash resolution) or else a power of 2 (so reducing the value to the correct range can be done with a simple bit-mask).


There are a number of existing hashtable implementations for C, from the C standard library hcreate/hdestroy/hsearch, to those in the APR and glib, which also provide prebuilt hash functions. I'd highly recommend using those rather than inventing your own hashtable or hash function; they've been optimized heavily for common use-cases.

If your dataset is static, however, your best solution is probably to use a perfect hash. gperf will generate a perfect hash for you for a given dataset.