Advantages of Set in ruby
Sure, whatever you can do with Set
, there is a way to do it with Array
. The advantage of using a Set
is that, since it is implemented based on Hash
, most operations on it are O(1) complexity, while doing it with Array
can be O(n).
Examples are:
Set.new([1, 2, 3]).include?(2) # O(1) complexity[1, 2, 3].include?(2) # O(n) complexity
These two classes defines different data structures:
Arrays
- can have duplicated elements
- maintains orders
- can be iterated in order
- searching for element is slow, appending element and getting element from position is fast
- maintaining uniqueness of elements is slow
Sets
- can't have duplicated elements
- don't have ordering (theoretical background: https://en.wikipedia.org/wiki/Partially_ordered_set)
- searching for existence of element is fast, appending element is fast
- uniqueness is given by design
Sets are actually taken from math concept: https://en.wikipedia.org/wiki/Set_(mathematics)
Internally inside Ruby set use hash for storage, as said in documentation:
Set uses Hash as storage, so you must note the following points:
Equality of elements is determined according to Object#eql? and Object#hash. Set assumes that the identity of each element does not change while it is stored. Modifying an element of a set will render the set to an unreliable state. When a string is to be stored, a frozen copy of the string is stored instead unless the original string is already frozen.
When you look into the code, it's internally stored as hash with user given objects as keys and boolean as values (to be precise: true when object is added).
Why one should use set? If you want to enforce uniqueness and you don't need any ordering - sets are your best choice. When you don't really care about uniqueness and ordering is important - Array is your choice.
Otherwise - you need to decide arbitrally ;)
For the obvious reasons, see the other answers here.For performance reasons, see the result of this little benchmark in MRI Ruby 1.9.3:
require 'benchmark' require 'set' array = (1..100000).to_a set = array.to_set #hash = Hash[array.map {|x| [x, nil]}] #beter voor heel grote volumes mar tragerhash = Hash[*array]Benchmark.bmbm do |x| x.report("Set.include?") { 10000.times { set.include?(99999) } } x.report("Array.include?") { 10000.times { array.include?(99999) } } x.report("Hash.include?") { 10000.times { hash.include?(99999) } } end
which gives
Rehearsal --------------------------------------------------Set.include? 0.000000 0.000000 0.000000 ( 0.015604)Array.include? 37.940000 0.000000 37.940000 ( 38.651992)Hash.include? 0.000000 0.000000 0.000000 ( 0.001000)---------------------------------------- total: 37.940000sec user system total realSet.include? 0.000000 0.000000 0.000000 ( 0.002001)Array.include? 38.157000 0.000000 38.157000 ( 38.730615)Hash.include? 0.000000 0.000000 0.000000 ( 0.001001)
Enough reason to use Set
or Hash
when possible.