assigning points to bins
numpy.histogram()
does exactly what you want.
The function signature is:
numpy.histogram(a, bins=10, range=None, normed=False, weights=None, new=None)
We're mostly interested in a
and bins
. a
is the input data that needs to be binned. bins
can be a number of bins (your num_bins
), or it can be a sequence of scalars, which denote bin edges (half open).
import numpyvalues = numpy.arange(10, dtype=int)bins = numpy.arange(-1, 11)freq, bins = numpy.histogram(values, bins)# freq is now [0 1 1 1 1 1 1 1 1 1 1]# bins is unchanged
To quote the documentation:
All but the last (righthand-most) bin is half-open. In other words, if
bins
is:
[1, 2, 3, 4]
then the first bin is
[1, 2)
(including 1, but excluding 2) and the second[2, 3)
. The last bin, however, is[3, 4]
, which includes 4.
Edit: You want to know the index in your bins of each element. For this, you can use numpy.digitize()
. If your bins are going to be integral, you can use numpy.bincount()
as well.
>>> values = numpy.random.randint(0, 20, 10)>>> valuesarray([17, 14, 9, 7, 6, 9, 19, 4, 2, 19])>>> bins = numpy.linspace(-1, 21, 23)>>> binsarray([ -1., 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.])>>> pos = numpy.digitize(values, bins)>>> posarray([19, 16, 11, 9, 8, 11, 21, 6, 4, 21])
Since the interval is open on the upper limit, the indices are correct:
>>> (bins[pos-1] == values).all()True>>> import sys>>> for n in range(len(values)):... sys.stdout.write("%g <= %g < %g\n"... %(bins[pos[n]-1], values[n], bins[pos[n]]))17 <= 17 < 1814 <= 14 < 159 <= 9 < 107 <= 7 < 86 <= 6 < 79 <= 9 < 1019 <= 19 < 204 <= 4 < 52 <= 2 < 319 <= 19 < 20
This is fairly straightforward in numpy using broadcasting--my example below is four lines of code (not counting first two lines to create bins and data points, which would of course ordinarily be supplied.)
import numpy as NP# just creating 5 bins at random, each bin expressed as (x, y, z) although, this code# is not limited by bin number or bin dimensionbins = NP.random.random_integers(10, 99, 15).reshape(5, 3) # creating 30 random data pointsdata = NP.random.random_integers(10, 99, 90).reshape(30, 3)# for each data point i want the nearest bin, but before i can generate a distance# matrix, i need to 'conform' the array dimensions# 'broadcasting' is an excellent and concise way to do thisbins = bins[:, NP.newaxis, :]data2 = data[NP.newaxis, :, :]# now i can calculate the distance matrixdist_matrix = NP.sqrt(NP.sum((data - bins)**2, axis=-1)) bin_assignments = NP.argmin(dist_matrix, axis=0)
'bin_assignments' is a 1d array of indices comprised of integer values from 0 to 4, corresponding to the five bins--the bin assignments for each of the 30 original points in the 'data' matrix above.