Advantages of Set in ruby Advantages of Set in ruby ruby ruby

Advantages of Set in ruby


Sure, whatever you can do with Set, there is a way to do it with Array. The advantage of using a Set is that, since it is implemented based on Hash, most operations on it are O(1) complexity, while doing it with Array can be O(n).

Examples are:

Set.new([1, 2, 3]).include?(2) # O(1) complexity[1, 2, 3].include?(2) # O(n) complexity


These two classes defines different data structures:

Arrays

  • can have duplicated elements
  • maintains orders
  • can be iterated in order
  • searching for element is slow, appending element and getting element from position is fast
  • maintaining uniqueness of elements is slow

Sets

Sets are actually taken from math concept: https://en.wikipedia.org/wiki/Set_(mathematics)

Internally inside Ruby set use hash for storage, as said in documentation:

Set uses Hash as storage, so you must note the following points:

Equality of elements is determined according to Object#eql? and Object#hash. Set assumes that the identity of each element does not change while it is stored. Modifying an element of a set will render the set to an unreliable state. When a string is to be stored, a frozen copy of the string is stored instead unless the original string is already frozen.

When you look into the code, it's internally stored as hash with user given objects as keys and boolean as values (to be precise: true when object is added).

Why one should use set? If you want to enforce uniqueness and you don't need any ordering - sets are your best choice. When you don't really care about uniqueness and ordering is important - Array is your choice.

Otherwise - you need to decide arbitrally ;)


For the obvious reasons, see the other answers here.For performance reasons, see the result of this little benchmark in MRI Ruby 1.9.3:

require 'benchmark' require 'set' array = (1..100000).to_a set = array.to_set #hash = Hash[array.map {|x| [x, nil]}] #beter voor heel grote volumes mar tragerhash = Hash[*array]Benchmark.bmbm do |x|   x.report("Set.include?")   { 10000.times { set.include?(99999) } }  x.report("Array.include?") { 10000.times { array.include?(99999) } }   x.report("Hash.include?")  { 10000.times { hash.include?(99999) } } end 

which gives

Rehearsal --------------------------------------------------Set.include?     0.000000   0.000000   0.000000 (  0.015604)Array.include?  37.940000   0.000000  37.940000 ( 38.651992)Hash.include?    0.000000   0.000000   0.000000 (  0.001000)---------------------------------------- total: 37.940000sec                     user     system      total        realSet.include?     0.000000   0.000000   0.000000 (  0.002001)Array.include?  38.157000   0.000000  38.157000 ( 38.730615)Hash.include?    0.000000   0.000000   0.000000 (  0.001001)

Enough reason to use Set or Hash when possible.