The best way to remove duplicate values from NSMutableArray in Objective-C? The best way to remove duplicate values from NSMutableArray in Objective-C? ios ios

The best way to remove duplicate values from NSMutableArray in Objective-C?


Your NSSet approach is the best if you're not worried about the order of the objects, but then again, if you're not worried about the order, then why aren't you storing them in an NSSet to begin with?

I wrote the answer below in 2009; in 2011, Apple added NSOrderedSet to iOS 5 and Mac OS X 10.7. What had been an algorithm is now two lines of code:

NSOrderedSet *orderedSet = [NSOrderedSet orderedSetWithArray:yourArray];NSArray *arrayWithoutDuplicates = [orderedSet array];

If you are worried about the order and you're running on iOS 4 or earlier, loop over a copy of the array:

NSArray *copy = [mutableArray copy];NSInteger index = [copy count] - 1;for (id object in [copy reverseObjectEnumerator]) {    if ([mutableArray indexOfObject:object inRange:NSMakeRange(0, index)] != NSNotFound) {        [mutableArray removeObjectAtIndex:index];    }    index--;}[copy release];


I know this is an old question, but there is a more elegant way to remove duplicates in a NSArray if you don't care about the order.

If we use Object Operators from Key Value Coding we can do this:

uniquearray = [yourarray valueForKeyPath:@"@distinctUnionOfObjects.self"];

As AnthoPak also noted it is possible to remove duplicates based on a property. An example would be: @distinctUnionOfObjects.name


Yes, using NSSet is a sensible approach.

To add to Jim Puls' answer, here's an alternative approach to stripping duplicates while retaining order:

// Initialise a new, empty mutable array NSMutableArray *unique = [NSMutableArray array];for (id obj in originalArray) {    if (![unique containsObject:obj]) {        [unique addObject:obj];    }}

It's essentially the same approach as Jim's but copies unique items to a fresh mutable array rather than deleting duplicates from the original. This makes it slightly more memory efficient in the case of a large array with lots of duplicates (no need to make a copy of the entire array), and is in my opinion a little more readable.

Note that in either case, checking to see if an item is already included in the target array (using containsObject: in my example, or indexOfObject:inRange: in Jim's) doesn't scale well for large arrays. Those checks run in O(N) time, meaning that if you double the size of the original array then each check will take twice as long to run. Since you're doing the check for each object in the array, you'll also be running more of those more expensive checks. The overall algorithm (both mine and Jim's) runs in O(N2) time, which gets expensive quickly as the original array grows.

To get that down to O(N) time you could use a NSMutableSet to store a record of items already added to the new array, since NSSet lookups are O(1) rather than O(N). In other words, checking to see whether an element is a member of an NSSet takes the same time regardless of how many elements are in the set.

Code using this approach would look something like this:

NSMutableArray *unique = [NSMutableArray array];NSMutableSet *seen = [NSMutableSet set];for (id obj in originalArray) {    if (![seen containsObject:obj]) {        [unique addObject:obj];        [seen addObject:obj];    }}

This still seems a little wasteful though; we're still generating a new array when the question made clear that the original array is mutable, so we should be able to de-dupe it in place and save some memory. Something like this:

NSMutableSet *seen = [NSMutableSet set];NSUInteger i = 0;while (i < [originalArray count]) {    id obj = [originalArray objectAtIndex:i];    if ([seen containsObject:obj]) {        [originalArray removeObjectAtIndex:i];        // NB: we *don't* increment i here; since        // we've removed the object previously at        // index i, [originalArray objectAtIndex:i]        // now points to the next object in the array.    } else {        [seen addObject:obj];        i++;    }}

UPDATE: Yuri Niyazov pointed out that my last answer actually runs in O(N2) because removeObjectAtIndex: probably runs in O(N) time.

(He says "probably" because we don't know for sure how it's implemented; but one possible implementation is that after deleting the object at index X the method then loops through every element from index X+1 to the last object in the array, moving them to the previous index. If that's the case then that is indeed O(N) performance.)

So, what to do? It depends on the situation. If you've got a large array and you're only expecting a small number of duplicates then the in-place de-duplication will work just fine and save you having to build up a duplicate array. If you've got an array where you're expecting lots of duplicates then building up a separate, de-duped array is probably the best approach. The take-away here is that big-O notation only describes the characteristics of an algorithm, it won't tell you definitively which is best for any given circumstance.