Creating an md5 hash of a number, string, array, or hash in Ruby Creating an md5 hash of a number, string, array, or hash in Ruby ruby ruby

Creating an md5 hash of a number, string, array, or hash in Ruby


I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.

This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.

def createsig(body)  Digest::MD5.hexdigest( sigflat body )enddef sigflat(body)  if body.class == Hash    arr = []    body.each do |key, value|      arr << "#{sigflat key}=>#{sigflat value}"    end    body = arr  end  if body.class == Array    str = ''    body.map! do |value|      sigflat value    end.sort!.each do |value|      str << value    end  end  if body.class != String    body = body.to_s << body.class.to_s  end  bodyend> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})=> true


If you could only get a string representation of body and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:

require 'digest/md5'class Object  def md5key    to_s  endendclass Array  def md5key    map(&:md5key).join  endendclass Hash  def md5key    sort.map(&:md5key).join  endend

Now any object (of the types mentioned in the question) respond to md5key by returning a reliable key to use for creating a checksum, so:

def createsig(o)  Digest::MD5.hexdigest(o.md5key)end

Example:

body = [  {    'bar' => [      345,      "baz",    ],    'qux' => 7,  },  "foo",  123,]p body.md5key        # => "bar345bazqux7foo123"p createsig(body)    # => "3a92036374de88118faf19483fe2572e"

Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].


Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)

I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".

module ObjSum  refine Object do    def objsum      parts = []      queue = [self]      while queue.size > 0        item = queue.shift        if item.kind_of?(Hash)          parts << "\\000"          item.keys.sort.each do |k|             queue << k            queue << item[k]          end        elsif item.kind_of?(Set)          parts << "\\001"          item.to_a.sort.each { |i| queue << i }        elsif item.kind_of?(Enumerable)          parts << "\\002"          item.each { |i| queue << i }        elsif item.kind_of?(Fixnum)          parts << "\\003"          parts << item.to_s        elsif item.kind_of?(Float)          parts << "\\004"          parts << item.to_s        else          parts << item.to_s        end      end      Digest::MD5.hexdigest(parts.join)    end  endend