Hashset vs Treeset Hashset vs Treeset java java

Hashset vs Treeset


HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.

HashSet

  • the class offers constant time performance for the basic operations (add, remove, contains and size).
  • it does not guarantee that the order of elements will remain constant over time
  • iteration performance depends on the initial capacity and the load factor of the HashSet.
    • It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.

TreeSet

  • guarantees log(n) time cost for the basic operations (add, remove and contains)
  • guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements SortedSet)
  • doesn't offer any tuning parameters for iteration performance
  • offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet() etc

Important points:

  • Both guarantee duplicate-free collection of elements
  • It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
  • None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
  • LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.

So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.

  • e.g. SortedSet<String> s = new TreeSet<String>(hashSet);


One advantage not yet mentioned of a TreeSet is that its has greater "locality", which is shorthand for saying (1) if two entries are nearby in the order, a TreeSet places them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.

This is in contrast to a HashSet, which spreads the entries all over memory, no matter what their keys are.

When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the TreeSet can be a much better choice.


Basing on lovely visual answer on Maps by @shevchyk here is my take:

╔══════════════╦═════════════════════╦═══════════════════╦═════════════════════╗║   Property   ║       HashSet       ║      TreeSet      ║     LinkedHashSet   ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║              ║  no guarantee order ║ sorted according  ║                     ║║   Order      ║ will remain constant║ to the natural    ║    insertion-order  ║║              ║      over time      ║    ordering       ║                     ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║ Add/remove   ║        O(1)         ║     O(log(n))     ║        O(1)         ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║              ║                     ║   NavigableSet    ║                     ║║  Interfaces  ║         Set         ║       Set         ║         Set         ║║              ║                     ║    SortedSet      ║                     ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║              ║                     ║    not allowed    ║                     ║║  Null values ║       allowed       ║ 1st element only  ║      allowed        ║║              ║                     ║     in Java 7     ║                     ║╠══════════════╬═════════════════════╩═══════════════════╩═════════════════════╣║              ║   Fail-fast behavior of an iterator cannot be guaranteed      ║║   Fail-fast  ║ impossible to make any hard guarantees in the presence of     ║║   behavior   ║           unsynchronized concurrent modification              ║╠══════════════╬═══════════════════════════════════════════════════════════════╣║      Is      ║                                                               ║║ synchronized ║              implementation is not synchronized               ║╚══════════════╩═══════════════════════════════════════════════════════════════╝