Determining duplicate values in an array

As of numpy version 1.9.0, np.unique has an argument return_counts which greatly simplifies your task:

u, c = np.unique(a, return_counts=True)dup = u[c > 1]

This is similar to using Counter, except you get a pair of arrays instead of a mapping. I'd be curious to see how they perform relative to each other.

It's probably worth mentioning that even though np.unique is quite fast in practice due to its numpyness, it has worse algorithmic complexity than the Counter solution. np.unique is sort-based, so runs asymptotically in O(n log n) time. Counter is hash-based, so has O(n) complexity. This will not matter much for anything but the largest datasets.

python numpy duplicates unique

I think this is most clear done outside of numpy. You'll have to time it against your numpy solutions if you are concerned with speed.

>>> import numpy as np>>> from collections import Counter>>> a = np.array([1, 2, 1, 3, 3, 3, 0])>>> [item for item, count in Counter(a).items() if count > 1][1, 3]

note: This is similar to Burhan Khalid's answer, but the use of items without subscripting in the condition should be faster.

python numpy duplicates unique

People have already suggested Counter variants, but here's one which doesn't use a listcomp:

>>> from collections import Counter>>> a = [1, 2, 1, 3, 3, 3, 0]>>> (Counter(a) - Counter(set(a))).keys()[1, 3]

[Posted not because it's efficient -- it's not -- but because I think it's cute that you can subtract Counter instances.]

CodeHunter

Determining duplicate values in an array

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last