Hashset vs Treeset
HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.
HashSet
- the class offers constant time performance for the basic operations (add, remove, contains and size).
- it does not guarantee that the order of elements will remain constant over time
- iteration performance depends on the initial capacity and the load factor of the HashSet.
- It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.
TreeSet
- guarantees log(n) time cost for the basic operations (add, remove and contains)
- guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements
SortedSet
) - doesn't offer any tuning parameters for iteration performance
- offers a few handy methods to deal with the ordered set like
first()
,last()
,headSet()
, andtailSet()
etc
Important points:
- Both guarantee duplicate-free collection of elements
- It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
- None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
- LinkedHashSet is in some sense intermediate between
HashSet
andTreeSet
. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.
So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.
- e.g.
SortedSet<String> s = new TreeSet<String>(hashSet);
One advantage not yet mentioned of a TreeSet
is that its has greater "locality", which is shorthand for saying (1) if two entries are nearby in the order, a TreeSet
places them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.
This is in contrast to a HashSet
, which spreads the entries all over memory, no matter what their keys are.
When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the TreeSet
can be a much better choice.
Basing on lovely visual answer on Maps by @shevchyk here is my take:
╔══════════════╦═════════════════════╦═══════════════════╦═════════════════════╗║ Property ║ HashSet ║ TreeSet ║ LinkedHashSet ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║ ║ no guarantee order ║ sorted according ║ ║║ Order ║ will remain constant║ to the natural ║ insertion-order ║║ ║ over time ║ ordering ║ ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║ Add/remove ║ O(1) ║ O(log(n)) ║ O(1) ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║ ║ ║ NavigableSet ║ ║║ Interfaces ║ Set ║ Set ║ Set ║║ ║ ║ SortedSet ║ ║╠══════════════╬═════════════════════╬═══════════════════╬═════════════════════╣║ ║ ║ not allowed ║ ║║ Null values ║ allowed ║ 1st element only ║ allowed ║║ ║ ║ in Java 7 ║ ║╠══════════════╬═════════════════════╩═══════════════════╩═════════════════════╣║ ║ Fail-fast behavior of an iterator cannot be guaranteed ║║ Fail-fast ║ impossible to make any hard guarantees in the presence of ║║ behavior ║ unsynchronized concurrent modification ║╠══════════════╬═══════════════════════════════════════════════════════════════╣║ Is ║ ║║ synchronized ║ implementation is not synchronized ║╚══════════════╩═══════════════════════════════════════════════════════════════╝