Speeding-up "for-loop" in image analysis when iterations are up to 40,000 Speeding-up "for-loop" in image analysis when iterations are up to 40,000 numpy numpy

Speeding-up "for-loop" in image analysis when iterations are up to 40,000


By my timing, this is about 10 times faster than your original method. I tested with these arrays:

import numpy as npsX=200sY=200FIDO = np.random.randint(0, sX*sY, (sX, sY))WBColor = np.random.randint(0, sX*sY, (sX, sY))RGColor = np.random.randint(0, sX*sY, (sX, sY))BYColor = np.random.randint(0, sX*sY, (sX, sY))

This is the part I timed:

import collectionscolors = {'wb': WBColor, 'rg': RGColor, 'by': BYColor}planes = colors.keys()S = {plane: np.zeros((sX, sY)) for plane in planes}for plane in planes:    counts = collections.defaultdict(int)    sums = collections.defaultdict(int)    for (i, j), f in np.ndenumerate(FIDO):        counts[f] += 1        sums[f] += colors[plane][i, j]    for (i, j), f in np.ndenumerate(FIDO):        S[plane][i, j] = sums[f]/counts[f]

Probably because even though looping in Python is slow, this traverses the data less.

Note, the original version gets faster if there are a small number of unique values in FIDO. This takes roughly the same time for most cases.


As @lejlot suggested before, the code is quite hard to vectorize. It cannot be run in parallel, unless you know which pixels belong to each FIDO in advance. I don't know if you call FIDO to superpixels, but I do usually work with this kind of problems, and the best solution I've found so far is as follows:

  • Flatten the data:

    data = data.reshape(-1, 3)labels = FIDO.copy()

    Here data is your (Width, Height, 3) image, rather than the separate 3 vectors that you have. It gets flattened to (Width * Height, 3).

  • Relabel FIDO to 0..N-1 range, where N=num unique FIDO:

    from skimage.segmentation import relabel_sequentiallabels = relabel_sequential(labels)[0]labels -= labels.min()

    The above, from scikit-image, transforms your FIDO array to the [0, N-1] range, which is much easier to work with later.

  • Last, code in cython a simple function to calculate the mean for each of the FIDO;s (as they are ordered from 0 to N, you can do it in a 1D array with length N):

    def fmeans(double[:, ::1] data, long[::1] labels, long nsp):    cdef long n,  N = labels.shape[0]    cdef int K = data.shape[1]    cdef double[:, ::1] F = np.zeros((nsp, K), np.float64)    cdef int[::1] sizes = np.zeros(nsp, np.int32)    cdef long l, b    cdef double t    for n in range(N):        l = labels[n]        sizes[l] += 1        for z in range(K):            t = data[n, z]            F[l, z] += t    for n in range(nsp):        for z in range(K):            F[n, z] /= sizes[n]return np.asarray(F)

You can call that function later (once compiled with cython), as simple as:

mean_colors = fmeans(data, labels.flatten(), labels.max()+1) # labels.max()+1 == N

The image of mean colors can then be recovered as:

mean_img = mean_colors[labels]

If you do not want to code in cython, scikit-image also provides bindings for this by using a graph structure and networkx, however is much slower:

http://scikit-image.org/docs/dev/auto_examples/plot_rag_mean_color.html

The above example contains the function calls you need to get an image with the average color of each superpixel as labels1 (your FIDO).

NOTE: The cython approach is much faster, as instead of iterating the number of unique FIDO N and for each of them scan the image (size M = Width x Height) this only iterates the image ONCE. Thus, computational cost is in the order of O(M+N) rather than O(M*N) of your original approach.


Example test:

import numpy as npfrom skimage.segmentation import relabel_sequentialsX=200sY=200FIDO = np.random.randint(0, sX*sY, (sX, sY))data = np.random.rand(sX, sY, 3) # Your image

Flatten and relabel:

data = data.reshape(-1, 3)labels = relabel_sequential(FIDO)[0]labels -= labels.min()

Obtain mean:

>>> %timeit color_means = fmeans(data, labels.flatten(), labels.max()+1)1000 loops, best of 3: 520 µs per loop

It takes 0.5ms (half a millisecond) for a 200x200 image for:

print labels.max()+1 # --> 25787 unique FIDOprint color_means.shape # --> (25287, 3), the mean color of each FIDO

You can restore the image of mean colors using smart indexing:

mean_image = color_means[labels]print mean_image.shape # --> (200, 200, 3)

I doubt you can get that speed with raw python approaches (or at least, I didn't find how).


Your code is not optimal, because you scan the all image for each region in FIDO. Better approach is to group the pixels of each region and calculate the means first. pandasgive nice tools for such computations (just on one canal here). Then you span the means on the regions :

import numpy as npimport pandas as pd     sX=200sY=200Nreg=sX*sYWBColor=np.random.randint(0,256,(sX,sY))FIDO=np.random.randint(0,Nreg,(sX,sY))def oldloop():    S_wb = np.zeros((sX, sY))    uniqueFIDOs, unique_counts = np.unique(FIDO, return_counts=True)     numFIDOs = uniqueFIDOs.shape     for i in np.arange(0,numFIDOs[0]):        Lookup = FIDO==uniqueFIDOs[i]        S_wb[Lookup] = np.sum(WBColor[Lookup])/unique_counts[i]    return S_wbdef newloop():    index=pd.Index(FIDO.flatten(),name='region')    means= pd.DataFrame(WBColor.flatten(),index).groupby(level='region').mean()    lookup=np.zeros(Nreg)    lookup[means.index]=means.values    return lookup[FIDO]

in this case, this is about 200 times faster :

In [32]: np.allclose(oldloop(),newloop())Out[32]: TrueIn [33]: %timeit -n1 oldloop()1 loops, best of 3: 3.92 s per loopIn [34]: %timeit -n100 newloop()100 loops, best of 3: 20.5 ms per loop    

EDIT

An other cool modern approach is to use numba. You write (very) basic python code running at nearly C speed :

from numba import jit@jitdef numbaloops():    counts=np.zeros(Nreg)    sums=np.zeros(Nreg)    S = np.empty((sX, sY))    for x in range(sX):        for y in range(sY):            region=FIDO[x,y]            value=WBColor[x,y]            counts[region]+=1            sums[region]+=value    for x in range(sX):        for y in range(sY):            region=FIDO[x,y]            S[x,y]=sums[region]/counts[region]    return S                

You are now about 4000 times faster :

In [45]: np.allclose(oldloop(),numbaloops())Out[45]: TrueIn [46]: %timeit -n1000 numbaloops()1000 loops, best of 3: 1.06 ms per loop