Very fast hash function for hashing 8-16 byte strings Very fast hash function for hashing 8-16 byte strings php php

Very fast hash function for hashing 8-16 byte strings


The first though was why don't he use a simple md5 function?.

Trying to write hash by myself

One of the most frequently referred function is a simple hash Bernstein's function also reffered to as Times 33 with Addition. It is used in php by zend to make hashes for keys of associative array. In php it could be implemented as follows:

function djb2($s){    $word = str_split($s);    $length = count($word);    $hashAddress = 5381;    for ($counter = 0; $counter < $length; $counter++){        $hashAddress = (($hashAddress << 5) + $hashAddress) + $word[$counter];    }    return $hashAddress;}echo djb2("stackoverflow");

The problem is that when it is implemented this way, it is rather slow. Tests shows that it is ~3 times slower, than md5. So we have to find the fastest internal implementation of a hash function.

Finding the best internal hash

Just take all algos and measure time to hash a million of strings.

function testing($algo, $str) {    $start = microtime(true);    for($ax = 0; $ax < 1000000; $ax++){        hash($algo, $str);    }    $end = microtime(true);    return ($end - $start);}$algos = hash_algos();$times = [];foreach($algos as $algo){    $times[$algo] = testing($algo, "stackoverflow");}// sort by time ASCasort($times);foreach($times as $algo => $time){    echo "$algo -> " . round($time, 2)."sec\n";}

My results was:

fnv1a32 -> 0.29secfnv132 -> 0.3seccrc32b -> 0.3secadler32 -> 0.3seccrc32 -> 0.31secjoaat -> 0.31secfnv1a64 -> 0.31secfnv164 -> 0.31secmd4 -> 0.46secmd5 -> 0.54sec...md2 -> 6.32sec

The result slightly changes from execution to execution - the first 8 algos are shuffling due to their close speeds and its dependency on the server load.

What should be chosen?

You can take any of top-8 functions above: $hash = hash('crc32', $string);. Actually a widely used md5 function is just 1.7 times slower than the leaders.

Bonus

There are another functions like SuperFastHash, that are not implemented in php code, but they are 4x faster than crc32.


Use xxHash. It's used by PrestoDB also. PHP implementation on GitHub


The processing time of a hashing function can be considered negligible in most cases. If you need a little hash (8 characters), you can simply use the crc32 function.

<?php$hash = hash('crc32', 'WhatDoYouWant');?>

You can also combine hash with uniqid to create random hash.

<?php$hash = hash('crc32', uniqid());?>