How do I create a HashCode in .net (c#) for a string that is safe to store in a database? How do I create a HashCode in .net (c#) for a string that is safe to store in a database? database database

How do I create a HashCode in .net (c#) for a string that is safe to store in a database?


It depends what properties you want that hash to have. For example, you could just write something like this:

public int HashString(string text){    // TODO: Determine nullity policy.    unchecked    {        int hash = 23;        foreach (char c in text)        {            hash = hash * 31 + c;        }        return hash;    }}

So long as you document that that is how the hash is computed, that's valid. It's in no way cryptographically secure or anything like that, but you can persist it with no problems. Two strings which are absolutely equal in the ordinal sense (i.e. with no cultural equality etc applied, exactly character-by-character the same) will produce the same hash with this code.

The problems come when you rely on undocumented hashing - i.e. something which obeys GetHashCode() but is in no way guaranteed to remain the same from version to version... like string.GetHashCode().

Writing and documenting your own hash like this is a bit like saying, "This sensitive information is hashed with MD5 (or whatever)". So long as it's a well-defined hash, that's fine.

EDIT: Other answers have suggested using cryptographic hashes such as SHA-1 or MD5. I would say that until we know there's a requirement for cryptographic security rather than just stability, there's no point in going through the rigmarole of converting the string to a byte array and hashing that. Of course if the hash is meant to be used for anything security-related, an industry-standard hash is exactly what you should be reaching for. But that wasn't mentioned anywhere in the question.


Here is a reimplementation of the current way .NET calculates it's string hash code for 64 bit systems. This does not use pointers like the real GetHashCode() does so it will be slightly slower, but it does make it more resilient to internal changes to string, this will give a more evenly distributed hash code than Jon Skeet's version which may result in better lookup times in dictionaries.

public static class StringExtensionMethods{    public static int GetStableHashCode(this string str)    {        unchecked        {            int hash1 = 5381;            int hash2 = hash1;            for(int i = 0; i < str.Length && str[i] != '\0'; i += 2)            {                hash1 = ((hash1 << 5) + hash1) ^ str[i];                if (i == str.Length - 1 || str[i+1] == '\0')                    break;                hash2 = ((hash2 << 5) + hash2) ^ str[i+1];            }            return hash1 + (hash2*1566083941);        }    }}


The answer is to just write your own hashing function. You can find source for some by following links in the comments to the article you posted. Or you can use a built-in hash function that's originally intended for cryptography (MD5, SHA1, etc.) and just not use all of the bits.