How to crop a numpy 2d array to non-zero values? How to crop a numpy 2d array to non-zero values? numpy numpy

How to crop a numpy 2d array to non-zero values?


After some more fiddling with this, i actually found a solution myself:

coords = np.argwhere(a)x_min, y_min = coords.min(axis=0)x_max, y_max = coords.max(axis=0)b = cropped = a[x_min:x_max+1, y_min:y_max+1]

The above works for boolean arrays out of the box. In case you have other conditions like a threshold t and want to crop to values larger than t, simply modify the first line:

coords = np.argwhere(a > t)


Here's one with slicing and argmax to get the bounds -

def smallestbox(a):    r = a.any(1)    if r.any():        m,n = a.shape        c = a.any(0)        out = a[r.argmax():m-r[::-1].argmax(), c.argmax():n-c[::-1].argmax()]    else:        out = np.empty((0,0),dtype=bool)    return out

Sample runs -

In [142]: aOut[142]: array([[False, False, False, False, False, False],       [False,  True, False,  True, False, False],       [False,  True,  True, False, False, False],       [False, False, False, False, False, False]])In [143]: smallestbox(a)Out[143]: array([[ True, False,  True],       [ True,  True, False]])In [144]: a[:] = 0In [145]: smallestbox(a)Out[145]: array([], shape=(0, 0), dtype=bool)In [146]: a[2,2] = 1In [147]: smallestbox(a)Out[147]: array([[ True]])

Benchmarking

Other approach(es) -

def argwhere_app(a): # @Jörn Hees's soln    coords = np.argwhere(a)    x_min, y_min = coords.min(axis=0)    x_max, y_max = coords.max(axis=0)    return a[x_min:x_max+1, y_min:y_max+1]

Timings for varying degrees of sparsity (approx. 10%, 50% & 90%) -

In [370]: np.random.seed(0)     ...: a = np.random.rand(5000,5000)>0.1In [371]: %timeit argwhere_app(a)     ...: %timeit smallestbox(a)1 loop, best of 3: 310 ms per loop100 loops, best of 3: 3.19 ms per loopIn [372]: np.random.seed(0)     ...: a = np.random.rand(5000,5000)>0.5In [373]: %timeit argwhere_app(a)     ...: %timeit smallestbox(a)1 loop, best of 3: 324 ms per loop100 loops, best of 3: 3.21 ms per loopIn [374]: np.random.seed(0)     ...: a = np.random.rand(5000,5000)>0.9In [375]: %timeit argwhere_app(a)     ...: %timeit smallestbox(a)10 loops, best of 3: 106 ms per loop100 loops, best of 3: 3.19 ms per loop


a = np.transpose(a[np.sum(a,1) != 0])a = np.transpose(a[np.sum(a,1) != 0])

It's not the quickest but it's alright.