Creating an md5 hash of a number, string, array, or hash in Ruby
I coding up the following pretty quickly and don't have time to really test it here at work, but it ought to do the job. Let me know if you find any issues with it and I'll take a look.
This should properly flatten out and sort the arrays and hashes, and you'd need to have to some pretty strange looking strings for there to be any collisions.
def createsig(body) Digest::MD5.hexdigest( sigflat body )enddef sigflat(body) if body.class == Hash arr = [] body.each do |key, value| arr << "#{sigflat key}=>#{sigflat value}" end body = arr end if body.class == Array str = '' body.map! do |value| sigflat value end.sort!.each do |value| str << value end end if body.class != String body = body.to_s << body.class.to_s end bodyend> sigflat({:a => {:b => 'b', :c => 'c'}, :d => 'd'}) == sigflat({:d => 'd', :a => {:c => 'c', :b => 'b'}})=> true
If you could only get a string representation of body
and not have the Ruby 1.8 hash come back with different orders from one time to the other, you could reliably hash that string representation. Let's get our hands dirty with some monkey patches:
require 'digest/md5'class Object def md5key to_s endendclass Array def md5key map(&:md5key).join endendclass Hash def md5key sort.map(&:md5key).join endend
Now any object (of the types mentioned in the question) respond to md5key
by returning a reliable key to use for creating a checksum, so:
def createsig(o) Digest::MD5.hexdigest(o.md5key)end
Example:
body = [ { 'bar' => [ 345, "baz", ], 'qux' => 7, }, "foo", 123,]p body.md5key # => "bar345bazqux7foo123"p createsig(body) # => "3a92036374de88118faf19483fe2572e"
Note: This hash representation does not encode the structure, only the concatenation of the values. Therefore ["a", "b", "c"] will hash the same as ["abc"].
Here's my solution. I walk the data structure and build up a list of pieces that get joined into a single string. In order to ensure that the class types seen affect the hash, I inject a single unicode character that encodes basic type information along the way. (For example, we want ["1", "2", "3"].objsum != [1,2,3].objsum)
I did this as a refinement on Object, it's easily ported to a monkey patch. To use it just require the file and run "using ObjSum".
module ObjSum refine Object do def objsum parts = [] queue = [self] while queue.size > 0 item = queue.shift if item.kind_of?(Hash) parts << "\\000" item.keys.sort.each do |k| queue << k queue << item[k] end elsif item.kind_of?(Set) parts << "\\001" item.to_a.sort.each { |i| queue << i } elsif item.kind_of?(Enumerable) parts << "\\002" item.each { |i| queue << i } elsif item.kind_of?(Fixnum) parts << "\\003" parts << item.to_s elsif item.kind_of?(Float) parts << "\\004" parts << item.to_s else parts << item.to_s end end Digest::MD5.hexdigest(parts.join) end endend