How to find cluster sizes in 2D numpy array? How to find cluster sizes in 2D numpy array? numpy numpy

How to find cluster sizes in 2D numpy array?


it seems like a percolation problem.The following link has your answer if you have scipy installed.

http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/

from pylab import *from scipy.ndimage import measurementsz2 = array([[0,0,0,0,0,0,0,0,0,0],    [0,0,1,0,0,0,0,0,0,0],    [0,0,1,0,1,0,0,0,1,0],    [0,0,0,0,0,0,1,0,1,0],    [0,0,0,0,0,0,1,0,0,0],    [0,0,0,0,1,0,1,0,0,0],    [0,0,0,0,0,1,1,0,0,0],    [0,0,0,1,0,1,0,0,0,0],    [0,0,0,0,1,0,0,0,0,0],    [0,0,0,0,0,0,0,0,0,0]])

This will identify the clusters:

lw, num = measurements.label(z2)print lwarray([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],   [0, 0, 1, 0, 0, 0, 0, 0, 0, 0],   [0, 0, 1, 0, 2, 0, 0, 0, 3, 0],   [0, 0, 0, 0, 0, 0, 4, 0, 3, 0],   [0, 0, 0, 0, 0, 0, 4, 0, 0, 0],   [0, 0, 0, 0, 5, 0, 4, 0, 0, 0],   [0, 0, 0, 0, 0, 4, 4, 0, 0, 0],   [0, 0, 0, 6, 0, 4, 0, 0, 0, 0],   [0, 0, 0, 0, 7, 0, 0, 0, 0, 0],   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

The following will calculate their area.

area = measurements.sum(z2, lw, index=arange(lw.max() + 1))print area[ 0.  2.  1.  2.  6.  1.  1.  1.]

This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.


I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:

http://en.wikipedia.org/wiki/Connected-component_labeling

Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.


I believe that your way ist almost correct, except that you are initializing the variable count over and over again whenever you recursively call your function clust_size. I would add the count variable to the input parameters of clust_size and just reinitialize it for every first call in your nested for loops with count = 0.

Like this, you would call clust_size always like count=clust_size(array, i ,j, count)I haven't tested it but it seems to me that it should work.

Hope it helps.