Which Java Collection should I use? Which Java Collection should I use? java java

Which Java Collection should I use?


Since I couldn't find a similar flowchart I decided to make one myself.

This flow chart does not try and cover things like synchronized access, thread safety etc or the legacy collections, but it does cover the 3 standard Sets, 3 standard Maps and 2 standard Lists.

enter image description here

This image was created for this answer and is licensed under a Creative Commons Attribution 4.0 International License. The simplest attribution is by linking to either this question or this answer.

Other resources

Probably the most useful other reference is the following page from the oracle documentation which describes each Collection.

HashSet vs TreeSet

There is a detailed discussion of when to use HashSet or TreeSet here:Hashset vs Treeset

ArrayList vs LinkedList

Detailed discussion: When to use LinkedList over ArrayList?


Summary of the major non-concurrent, non-synchronized collections

Collection: An interface representing an unordered "bag" of items, called "elements". The "next" element is undefined (random).

  • Set: An interface representing a Collection with no duplicates.
    • HashSet: A Set backed by a Hashtable. Fastest and smallest memory usage, when ordering is unimportant.
    • LinkedHashSet: A HashSet with the addition of a linked list to associate elements in insertion order. The "next" element is the next-most-recently inserted element.
    • TreeSet: A Set where elements are ordered by a Comparator (typically natural ordering). Slowest and largest memory usage, but necessary for comparator-based ordering.
    • EnumSet: An extremely fast and efficient Set customized for a single enum type.
  • List: An interface representing a Collection whose elements are ordered and each have a numeric index representing its position, where zero is the first element, and (length - 1) is the last.
    • ArrayList: A List backed by an array, where the array has a length (called "capacity") that is at least as large as the number of elements (the list's "size"). When size exceeds capacity (when the (capacity + 1)-th element is added), the array is recreated with a new capacity of (new length * 1.5)--this recreation is fast, since it uses System.arrayCopy(). Deleting and inserting/adding elements requires all neighboring elements (to the right) be shifted into or out of that space. Accessing any element is fast, as it only requires the calculation (element-zero-address + desired-index * element-size) to find it's location. In most situations, an ArrayList is preferred over a LinkedList.
    • LinkedList: A List backed by a set of objects, each linked to its "previous" and "next" neighbors. A LinkedList is also a Queue and Deque. Accessing elements is done starting at the first or last element, and traversing until the desired index is reached. Insertion and deletion, once the desired index is reached via traversal is a trivial matter of re-mapping only the immediate-neighbor links to point to the new element or bypass the now-deleted element.
  • Map: An interface representing an Collection where each element has an identifying "key"--each element is a key-value pair.
    • HashMap: A Map where keys are unordered, and backed by a Hashtable.
    • LinkedhashMap: Keys are ordered by insertion order.
    • TreeMap: A Map where keys are ordered by a Comparator (typically natural ordering).
  • Queue: An interface that represents a Collection where elements are, typically, added to one end, and removed from the other (FIFO: first-in, first-out).
  • Stack: An interface that represents a Collection where elements are, typically, both added (pushed) and removed (popped) from the same end (LIFO: last-in, first-out).
  • Deque: Short for "double ended queue", usually pronounced "deck". A linked list that is typically only added to and read from either end (not the middle).

Basic collection diagrams:

diagram

Comparing the insertion of an element with an ArrayList and LinkedList:

diagram


Even simpler picture is here. Intentionally simplified!

  1. Collection is anything holding data called "elements" (of the same type). Nothing more specific is assumed.

  2. List is an indexed collection of data where each element has an index. Something like the array, but more flexible.

    Data in the list keep the order of insertion.

    Typical operation: get the n-th element.

  3. Set is a bag of elements, each elements just once (the elements are distinguished using their equals() method.

    Data in the set are stored mostly just to know what data are there.

    Typical operation: tell if an element is present in the list.

  4. Map is something like the List, but instead of accessing the elements by their integer index, you access them by their key, which is any object. Like the array in PHP :)

    Data in Map are searchable by their key.

    Typical operation: get an element by its ID (where ID is of any type, not only int as in case of List).

The differences

  • Set vs. Map: in Set you search data by themselves, whilst in Map by their key.

    N.B. The standard library Sets are indeed implemented exactly like this: a map where the keys are the Set elements themselves, and with a dummy value.

  • List vs. Map: in List you access elements by their int index (position in List), whilst in Map by their key which os of any type (typically: ID)

  • List vs. Set: in List the elements are bound by their position and can be duplicate, whilst in Set the elements are just "present" (or not present) and are unique (in the meaning of equals(), or compareTo() for SortedSet)