How to find cluster sizes in 2D numpy array?
it seems like a percolation problem.The following link has your answer if you have scipy installed.
http://dragly.org/2013/03/25/working-with-percolation-clusters-in-python/
from pylab import *from scipy.ndimage import measurementsz2 = array([[0,0,0,0,0,0,0,0,0,0], [0,0,1,0,0,0,0,0,0,0], [0,0,1,0,1,0,0,0,1,0], [0,0,0,0,0,0,1,0,1,0], [0,0,0,0,0,0,1,0,0,0], [0,0,0,0,1,0,1,0,0,0], [0,0,0,0,0,1,1,0,0,0], [0,0,0,1,0,1,0,0,0,0], [0,0,0,0,1,0,0,0,0,0], [0,0,0,0,0,0,0,0,0,0]])
This will identify the clusters:
lw, num = measurements.label(z2)print lwarray([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [0, 0, 1, 0, 2, 0, 0, 0, 3, 0], [0, 0, 0, 0, 0, 0, 4, 0, 3, 0], [0, 0, 0, 0, 0, 0, 4, 0, 0, 0], [0, 0, 0, 0, 5, 0, 4, 0, 0, 0], [0, 0, 0, 0, 0, 4, 4, 0, 0, 0], [0, 0, 0, 6, 0, 4, 0, 0, 0, 0], [0, 0, 0, 0, 7, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
The following will calculate their area.
area = measurements.sum(z2, lw, index=arange(lw.max() + 1))print area[ 0. 2. 1. 2. 6. 1. 1. 1.]
This gives what you expect, although I would think that you would have a cluster with 8 members by eye-percolation.
I feel your problem with finding "clusters", is essentially the same problem of finding connected components in a binary image (with values of either 0 or 1) based on 4-connectivity. You can see several algorithms to identify the connected components (or "clusters" as you defined them) in this Wikipedia page:
http://en.wikipedia.org/wiki/Connected-component_labeling
Once the connected components or "clusters" are labelled, you can find any information you want easily, including the area, relative position or any other information you may want.
I believe that your way ist almost correct, except that you are initializing the variable count
over and over again whenever you recursively call your function clust_size
. I would add the count variable to the input parameters of clust_size
and just reinitialize it for every first call in your nested for
loops with count = 0
.
Like this, you would call clust_size
always like count=clust_size(array, i ,j, count)
I haven't tested it but it seems to me that it should work.
Hope it helps.