Good Hash Function for Strings Good Hash Function for Strings java java

Good Hash Function for Strings


Usually hashes wouldn't do sums, otherwise stop and pots will have the same hash.

and you wouldn't limit it to the first n characters because otherwise house and houses would have the same hash.

Generally hashs take values and multiply it by a prime number (makes it more likely to generate unique hashes) So you could do something like:

int hash = 7;for (int i = 0; i < strlen; i++) {    hash = hash*31 + charAt(i);}


If it's a security thing, you could use Java crypto:

import java.security.MessageDigest;MessageDigest messageDigest = MessageDigest.getInstance("SHA-256");messageDigest.update(stringToHash.getBytes());String stringHash = new String(messageDigest.digest());


You should probably use String.hashCode().

If you really want to implement hashCode yourself:

Do not be tempted to exclude significant parts of an object from the hash code computation to improve performance -- Joshua Bloch, Effective Java

Using only the first five characters is a bad idea. Think about hierarchical names, such as URLs: they will all have the same hash code (because they all start with "http://", which means that they are stored under the same bucket in a hash map, exhibiting terrible performance.

Here's a war story paraphrased on the String hashCode from "Effective Java":

The String hash function implemented in all releases prior to 1.2 examined at most sixteen characters, evenly spaced throughout the string, starting with the first character. For large collections of hierarchical names, such as URLs, this hash function displayed terrible behavior.