Should a HashSet be allowed to be added to itself in Java? Should a HashSet be allowed to be added to itself in Java? java java

Should a HashSet be allowed to be added to itself in Java?


Others have already pointed out why it is questionable from a mathematical point of view, by referring to Russell's paradox.

This does not answer your question on a technical level, though.

So let's dissect this:

First, once more the relevant part from the JavaDoc of the Set interface:

Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

Interestingly, the JavaDoc of the List interface makes a similar, although somewhat weaker, and at the same time more technical statement:

While it is permissible for lists to contain themselves as elements, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a list.

And finally, the crux is in the JavaDoc of the Collection interface, which is the common ancestor of both the Set and the List interface:

Some collection operations which perform recursive traversal of the collection may fail with an exception for self-referential instances where the collection directly or indirectly contains itself. This includes the clone(), equals(), hashCode() and toString() methods. Implementations may optionally handle the self-referential scenario, however most current implementations do not do so.

(Emphasis by me)

The bold part is a hint at why the approach that you proposed in your question would not be sufficient:

it seems like there should be an equality check before adding an element to avoid violating that contract, no?

This would not help you here. The key point is that you'll always run into problems when the collection will directly or indirectly contain itself. Imagine this scenario:

Set<Object> setA = new HashSet<Object>();Set<Object> setB = new HashSet<Object>();setA.add(setB);setB.add(setA);

Obviously, neither of the sets contains itself directly. But each of them contains the other - and therefore, itself indirectly. This could not be avoided by a simple referential equality check (using == in the add method).


Avoiding such an "inconsistent state" is basically impossible in practice. Of course it is possible in theory, using referential Reachability computations. In fact, the Garbage Collector basically has to do exactly that!

But it becomes impossible in practice when custom classes are involved. Imagine a class like this:

class Container {    Set<Object> set;    @Override     int hashCode() {        return set.hashCode();     }}

And messing around with this and its set:

Set<Object> set = new HashSet<Object>();Container container = new Container();container.set = set;set.add(container);

The add method of the Set basically has no way of detecting whether the object that is added there has some (indirect) reference to the set itself.

Long story short:

You cannot prevent the programmer from messing things up.


Adding the collection into itself once causes the test to pass. Adding it twice causes the StackOverflowError which you were seeking.

From a personal developer standpoint, it doesn't make any sense to enforce a check in the underlying code to prevent this. The fact that you get a StackOverflowError in your code if you attempt to do this too many times, or calculate the hashCode - which would cause an instant overflow - should be enough to ensure that no sane developer would keep this kind of code in their code base.


You need to read the full doc and quote it fully:

The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.

The actual restriction is in the first sentence. The behavior is unspecified if an element of a set is mutated.

Since adding a set to itself mutates it, and adding it again mutates it again, the result is unspecified.

Note that the restriction is that the behavior is unspecified, and that a special case of that restriction is adding the set to itself.

So the doc says, in other words, that adding a set to itself results in unspecified behavior, which is what you are seeing. It's up to the concrete implementation to deal with (or not).