Performance of Arrays and Hashes in Ruby
Hashes are much faster for lookups:
require 'benchmark'Document = Struct.new(:id,:a,:b,:c)documents_a = []documents_h = {}1.upto(10_000) do |n| d = Document.new(n) documents_a << d documents_h[d.id] = dendsearchlist = Array.new(1000){ rand(10_000)+1 }Benchmark.bm(10) do |x| x.report('array'){searchlist.each{|el| documents_a.any?{|d| d.id == el}} } x.report('hash'){searchlist.each{|el| documents_h.has_key?(el)} }end# user system total real#array 2.240000 0.020000 2.260000 ( 2.370452)#hash 0.000000 0.000000 0.000000 ( 0.000695)
When using unique values, you can use the Ruby Set which has been previously mentioned. Here are benchmark results. It's slightly slower than the hash though.
user system total realarray 0.460000 0.000000 0.460000 ( 0.460666)hash 0.000000 0.000000 0.000000 ( 0.000219)set 0.000000 0.000000 0.000000 ( 0.000273)
I simply added to @steenslag's code which can be found here https://gist.github.com/rsiddle/a87df54191b6b9dfe7c9.
I used ruby 2.1.1p76 for this test.
Ruby has a set class in its standard library, have you considering keeping an (additional) set of IDs only?
http://stdlib.rubyonrails.org/libdoc/set/rdoc/index.html
To quote the docs: "This is a hybrid of Array’s intuitive inter-operation facilities and Hash’s fast lookup".