numpy: most efficient frequency counts for unique values in an array numpy: most efficient frequency counts for unique values in an array python python

# numpy: most efficient frequency counts for unique values in an array

As of Numpy 1.9, the easiest and fastest method is to simply use `numpy.unique`, which now has a `return_counts` keyword argument:

``import numpy as npx = np.array([1,1,1,2,2,2,5,25,1,1])unique, counts = np.unique(x, return_counts=True)print np.asarray((unique, counts)).T``

Which gives:

`` [[ 1  5]  [ 2  3]  [ 5  1]  [25  1]]``

A quick comparison with `scipy.stats.itemfreq`:

``In [4]: x = np.random.random_integers(0,100,1e6)In [5]: %timeit unique, counts = np.unique(x, return_counts=True)10 loops, best of 3: 31.5 ms per loopIn [6]: %timeit scipy.stats.itemfreq(x)10 loops, best of 3: 170 ms per loop``

Take a look at `np.bincount`:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

``import numpy as npx = np.array([1,1,1,2,2,2,5,25,1,1])y = np.bincount(x)ii = np.nonzero(y)[0]``

And then:

``zip(ii,y[ii]) # [(1, 5), (2, 3), (5, 1), (25, 1)]``

or:

``np.vstack((ii,y[ii])).T# array([[ 1,  5],         [ 2,  3],         [ 5,  1],         [25,  1]])``

or however you want to combine the counts and the unique values.

Update: The method mentioned in the original answer is deprecated, we should use the new way instead:

``>>> import numpy as np>>> x = [1,1,1,2,2,2,5,25,1,1]>>> np.array(np.unique(x, return_counts=True)).T    array([[ 1,  5],           [ 2,  3],           [ 5,  1],           [25,  1]])``

``>>> from scipy.stats import itemfreq>>> x = [1,1,1,2,2,2,5,25,1,1]>>> itemfreq(x)/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`array([[  1.,   5.],       [  2.,   3.],       [  5.,   1.],       [ 25.,   1.]])``