numpy: most efficient frequency counts for unique values in an array
As of Numpy 1.9, the easiest and fastest method is to simply use numpy.unique
, which now has a return_counts
keyword argument:
import numpy as npx = np.array([1,1,1,2,2,2,5,25,1,1])unique, counts = np.unique(x, return_counts=True)print np.asarray((unique, counts)).T
Which gives:
[[ 1 5] [ 2 3] [ 5 1] [25 1]]
A quick comparison with scipy.stats.itemfreq
:
In [4]: x = np.random.random_integers(0,100,1e6)In [5]: %timeit unique, counts = np.unique(x, return_counts=True)10 loops, best of 3: 31.5 ms per loopIn [6]: %timeit scipy.stats.itemfreq(x)10 loops, best of 3: 170 ms per loop
Take a look at np.bincount
:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
import numpy as npx = np.array([1,1,1,2,2,2,5,25,1,1])y = np.bincount(x)ii = np.nonzero(y)[0]
And then:
zip(ii,y[ii]) # [(1, 5), (2, 3), (5, 1), (25, 1)]
or:
np.vstack((ii,y[ii])).T# array([[ 1, 5], [ 2, 3], [ 5, 1], [25, 1]])
or however you want to combine the counts and the unique values.
Update: The method mentioned in the original answer is deprecated, we should use the new way instead:
>>> import numpy as np>>> x = [1,1,1,2,2,2,5,25,1,1]>>> np.array(np.unique(x, return_counts=True)).T array([[ 1, 5], [ 2, 3], [ 5, 1], [25, 1]])
Original answer:
you can use scipy.stats.itemfreq
>>> from scipy.stats import itemfreq>>> x = [1,1,1,2,2,2,5,25,1,1]>>> itemfreq(x)/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`array([[ 1., 5.], [ 2., 3.], [ 5., 1.], [ 25., 1.]])