assigning points to bins

numpy.histogram() does exactly what you want.

The function signature is:

numpy.histogram(a, bins=10, range=None, normed=False, weights=None, new=None)

We're mostly interested in a and bins. a is the input data that needs to be binned. bins can be a number of bins (your num_bins), or it can be a sequence of scalars, which denote bin edges (half open).

import numpyvalues = numpy.arange(10, dtype=int)bins = numpy.arange(-1, 11)freq, bins = numpy.histogram(values, bins)# freq is now [0 1 1 1 1 1 1 1 1 1 1]# bins is unchanged

To quote the documentation:

All but the last (righthand-most) bin is half-open. In other words, if bins is:
[1, 2, 3, 4]
then the first bin is [1, 2) (including 1, but excluding 2) and the second [2, 3). The last bin, however, is [3, 4], which includes 4.

Edit: You want to know the index in your bins of each element. For this, you can use numpy.digitize(). If your bins are going to be integral, you can use numpy.bincount() as well.

>>> values = numpy.random.randint(0, 20, 10)>>> valuesarray([17, 14,  9,  7,  6,  9, 19,  4,  2, 19])>>> bins = numpy.linspace(-1, 21, 23)>>> binsarray([ -1.,   0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,        10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,        21.])>>> pos = numpy.digitize(values, bins)>>> posarray([19, 16, 11,  9,  8, 11, 21,  6,  4, 21])

Since the interval is open on the upper limit, the indices are correct:

>>> (bins[pos-1] == values).all()True>>> import sys>>> for n in range(len(values)):...     sys.stdout.write("%g <= %g < %g\n"...             %(bins[pos[n]-1], values[n], bins[pos[n]]))17 <= 17 < 1814 <= 14 < 159 <= 9 < 107 <= 7 < 86 <= 6 < 79 <= 9 < 1019 <= 19 < 204 <= 4 < 52 <= 2 < 319 <= 19 < 20

python numpy scipy binning

This is fairly straightforward in numpy using broadcasting--my example below is four lines of code (not counting first two lines to create bins and data points, which would of course ordinarily be supplied.)

import numpy as NP# just creating 5 bins at random, each bin expressed as (x, y, z) although, this code# is not limited by bin number or bin dimensionbins = NP.random.random_integers(10, 99, 15).reshape(5, 3) # creating 30 random data pointsdata = NP.random.random_integers(10, 99, 90).reshape(30, 3)# for each data point i want the nearest bin, but before i can generate a distance# matrix, i need to 'conform' the array dimensions# 'broadcasting' is an excellent and concise way to do thisbins = bins[:, NP.newaxis, :]data2 = data[NP.newaxis, :, :]# now i can calculate the distance matrixdist_matrix = NP.sqrt(NP.sum((data - bins)**2, axis=-1)) bin_assignments = NP.argmin(dist_matrix, axis=0)

'bin_assignments' is a 1d array of indices comprised of integer values from 0 to 4, corresponding to the five bins--the bin assignments for each of the 30 original points in the 'data' matrix above.

CodeHunter

assigning points to bins

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last