Numpy rebinning a 2D array

numpy binning

You can use a higher dimensional view of your array and take the average along the extra dimensions:

In [12]: a = np.arange(36).reshape(6, 6)In [13]: aOut[13]: array([[ 0,  1,  2,  3,  4,  5],       [ 6,  7,  8,  9, 10, 11],       [12, 13, 14, 15, 16, 17],       [18, 19, 20, 21, 22, 23],       [24, 25, 26, 27, 28, 29],       [30, 31, 32, 33, 34, 35]])In [14]: a_view = a.reshape(3, 2, 3, 2)In [15]: a_view.mean(axis=3).mean(axis=1)Out[15]: array([[  3.5,   5.5,   7.5],       [ 15.5,  17.5,  19.5],       [ 27.5,  29.5,  31.5]])

In general, if you want bins of shape (a, b) for an array of (rows, cols), your reshaping of it should be .reshape(rows // a, a, cols // b, b). Note also that the order of the .mean is important, e.g. a_view.mean(axis=1).mean(axis=3) will raise an error, because a_view.mean(axis=1) only has three dimensions, although a_view.mean(axis=1).mean(axis=2) will work fine, but it makes it harder to understand what is going on.

As is, the above code only works if you can fit an integer number of bins inside your array, i.e. if a divides rows and b divides cols. There are ways to deal with other cases, but you will have to define the behavior you want then.

numpy binning

See the SciPy Cookbook on rebinning, which provides this snippet:

def rebin(a, *args):    '''rebin ndarray data into a smaller ndarray of the same rank whose dimensions    are factors of the original dimensions. eg. An array with 6 columns and 4 rows    can be reduced to have 6,3,2 or 1 columns and 4,2 or 1 rows.    example usages:    >>> a=rand(6,4); b=rebin(a,3,2)    >>> a=rand(6); b=rebin(a,2)    '''    shape = a.shape    lenShape = len(shape)    factor = asarray(shape)/asarray(args)    evList = ['a.reshape('] + \             ['args[%d],factor[%d],'%(i,i) for i in range(lenShape)] + \             [')'] + ['.sum(%d)'%(i+1) for i in range(lenShape)] + \             ['/factor[%d]'%i for i in range(lenShape)]    print ''.join(evList)    return eval(''.join(evList))

numpy binning

I assume that you only want to know how to generally build a function that performs well and does something with arrays, just like numpy.reshape in your example. So if performance really matters and you're already using numpy, you can write your own C code for that, like numpy does. For example, the implementation of arange is completely in C. Almost everything with numpy which matters in terms of performance is implemented in C.

However, before doing so you should try to implement the code in python and see if the performance is good enough. Try do make the python code as efficient as possible. If it still doesn't suit your performance needs, go the C way.

You may read about that in the docs.

CodeHunter

Numpy rebinning a 2D array

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last