Finding duplicates in O(n) time and O(1) space Finding duplicates in O(n) time and O(1) space c c

Finding duplicates in O(n) time and O(1) space


This is what I came up with, which doesn't require the additional sign bit:

for i := 0 to n - 1    while A[A[i]] != A[i]         swap(A[i], A[A[i]])    end whileend forfor i := 0 to n - 1    if A[i] != i then         print A[i]    end ifend for

The first loop permutes the array so that if element x is present at least once, then one of those entries will be at position A[x].

Note that it may not look O(n) at first blush, but it is - although it has a nested loop, it still runs in O(N) time. A swap only occurs if there is an i such that A[i] != i, and each swap sets at least one element such that A[i] == i, where that wasn't true before. This means that the total number of swaps (and thus the total number of executions of the while loop body) is at most N-1.

The second loop prints the values of x for which A[x] doesn't equal x - since the first loop guarantees that if x exists at least once in the array, one of those instances will be at A[x], this means that it prints those values of x which are not present in the array.

(Ideone link so you can play with it)


caf's brilliant answer prints each number that appears k times in the array k-1 times. That's useful behaviour, but the question arguably calls for each duplicate to be printed once only, and he alludes to the possibility of doing this without blowing the linear time/constant space bounds. This can be done by replacing his second loop with the following pseudocode:

for (i = 0; i < N; ++i) {    if (A[i] != i && A[A[i]] == A[i]) {        print A[i];        A[A[i]] = i;    }}

This exploits the property that after the first loop runs, if any value m appears more than once, then one of those appearances is guaranteed to be in the correct position, namely A[m]. If we are careful we can use that "home" location to store information about whether any duplicates have been printed yet or not.

In caf's version, as we went through the array, A[i] != i implied that A[i] is a duplicate. In my version, I rely on a slightly different invariant: that A[i] != i && A[A[i]] == A[i] implies that A[i] is a duplicate that we haven't seen before. (If you drop the "that we haven't seen before" part, the rest can be seen to be implied by the truth of caf's invariant, and the guarantee that all duplicates have some copy in a home location.) This property holds at the outset (after caf's 1st loop finishes) and I show below that it's maintained after each step.

As we go through the array, success on the A[i] != i part of the test implies that A[i] could be a duplicate that hasn't been seen before. If we haven't seen it before, then we expect A[i]'s home location to point to itself -- that's what's tested for by the second half of the if condition. If that's the case, we print it and alter the home location to point back to this first found duplicate, creating a 2-step "cycle".

To see that this operation doesn't alter our invariant, suppose m = A[i] for a particular position i satisfying A[i] != i && A[A[i]] == A[i]. It's obvious that the change we make (A[A[i]] = i) will work to prevent other non-home occurrences of m from being output as duplicates by causing the 2nd half of their if conditions to fail, but will it work when i arrives at the home location, m? Yes it will, because now, even though at this new i we find that the 1st half of the if condition, A[i] != i, is true, the 2nd half tests whether the location it points to is a home location and finds that it isn't. In this situation we no longer know whether m or A[m] was the duplicate value, but we know that either way, it has already been reported, because these 2-cycles are guaranteed not to appear in the result of caf's 1st loop. (Note that if m != A[m] then exactly one of m and A[m] occurs more than once, and the other does not occur at all.)


Here is the pseudocode

for i <- 0 to n-1:   if (A[abs(A[i])]) >= 0 :       (A[abs(A[i])]) = -(A[abs(A[i])])   else      print iend for

Sample code in C++