How can I ignore zeros when I take the median on columns of an array?
Masked array
is always handy, but slooooooow:
In [14]:%timeit np.ma.median(y, axis=0).filled(0)1000 loops, best of 3: 1.73 ms per loopIn [15]:%%timeitans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)ans[np.isnan(ans)]=0.1000 loops, best of 3: 402 µs per loopIn [16]:ans=np.apply_along_axis(lambda v: np.median(v[v!=0]), 0, x)ans[np.isnan(ans)]=0.; ansOut[16]:array([ 9., 9., 9., 0.])
np.nonzero
is even faster:
In [25]:%%timeitans=np.apply_along_axis(lambda v: np.median(v[np.nonzero(v)]), 0, x)ans[np.isnan(ans)]=0.1000 loops, best of 3: 384 µs per loop
Use masked arrays and np.ma.median(axis=0).filled(0)
to get the medians of the columns.
In [1]: x = np.array([[10, 0, 10, 0], [1, 1, 0, 0], [9, 9, 9, 0], [0, 10, 1, 0]])In [2]: y = np.ma.masked_where(x == 0, x)In [3]: xOut[3]: array([[10, 0, 10, 0], [ 1, 1, 0, 0], [ 9, 9, 9, 0], [ 0, 10, 1, 0]])In [4]: yOut[4]: masked_array(data = [[10 -- 10 --] [1 1 -- --] [9 9 9 --] [-- 10 1 --]], mask = [[False True False True] [False False True True] [False False False True] [ True False False True]], fill_value = 999999)In [6]: np.median(x, axis=0)Out[6]: array([ 5., 5., 5., 0.])In [7]: np.ma.median(y, axis=0).filled(0)Out[7]: array(data = [ 9. 9. 9., 0.])
You can use masked arrays.
a = np.array([[10, 0, 10, 0], [1, 1, 0, 0],[9,9,9,0],[0,10,1,0]])m = np.ma.masked_equal(a, 0)In [44]: np.median(a)Out[44]: 1.0In [45]: np.ma.median(m)Out[45]: 9.0In [46]: mOut[46]:masked_array(data = [[10 -- 10 --] [1 1 -- --] [9 9 9 --] [-- 10 1 --]], mask = [[False True False True] [False False True True] [False False False True] [ True False False True]], fill_value = 0)