numpy: most efficient frequency counts for unique values in an array numpy: most efficient frequency counts for unique values in an array python python

numpy: most efficient frequency counts for unique values in an array


As of Numpy 1.9, the easiest and fastest method is to simply use numpy.unique, which now has a return_counts keyword argument:

import numpy as npx = np.array([1,1,1,2,2,2,5,25,1,1])unique, counts = np.unique(x, return_counts=True)print np.asarray((unique, counts)).T

Which gives:

 [[ 1  5]  [ 2  3]  [ 5  1]  [25  1]]

A quick comparison with scipy.stats.itemfreq:

In [4]: x = np.random.random_integers(0,100,1e6)In [5]: %timeit unique, counts = np.unique(x, return_counts=True)10 loops, best of 3: 31.5 ms per loopIn [6]: %timeit scipy.stats.itemfreq(x)10 loops, best of 3: 170 ms per loop


Take a look at np.bincount:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

import numpy as npx = np.array([1,1,1,2,2,2,5,25,1,1])y = np.bincount(x)ii = np.nonzero(y)[0]

And then:

zip(ii,y[ii]) # [(1, 5), (2, 3), (5, 1), (25, 1)]

or:

np.vstack((ii,y[ii])).T# array([[ 1,  5],         [ 2,  3],         [ 5,  1],         [25,  1]])

or however you want to combine the counts and the unique values.


Update: The method mentioned in the original answer is deprecated, we should use the new way instead:

>>> import numpy as np>>> x = [1,1,1,2,2,2,5,25,1,1]>>> np.array(np.unique(x, return_counts=True)).T    array([[ 1,  5],           [ 2,  3],           [ 5,  1],           [25,  1]])

Original answer:

you can use scipy.stats.itemfreq

>>> from scipy.stats import itemfreq>>> x = [1,1,1,2,2,2,5,25,1,1]>>> itemfreq(x)/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`array([[  1.,   5.],       [  2.,   3.],       [  5.,   1.],       [ 25.,   1.]])