How can I run a numpy function percentile() on a masked array?
If you fill your masked values as np.nan
, you could then use np.nanpercentile
import numpy as npdata = np.arange(-5.5,10.5) # Note that you need a non-integer array to store NaNmdata = np.ma.masked_where(data < 0, data)mdata = np.ma.filled(mdata, np.nan)np.nanpercentile(mdata, 50) # 50th percentile
Looking at the np.percentile
code it is clear it does nothing special with masked arrays.
def percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False): q = array(q, dtype=np.float64, copy=True) r, k = _ureduce(a, func=_percentile, q=q, axis=axis, out=out, overwrite_input=overwrite_input, interpolation=interpolation) if keepdims: if q.ndim == 0: return r.reshape(k) else: return r.reshape([len(q)] + k) else: return r
Where _ureduce
and _percentile
are internal functions defined in numpy/lib/function_base.py
. So the real action is more complex.
Masked arrays have 2 strategies for using numpy functions. One is to fill
- replace the masked values with innocuous ones, for example 0 when doing sum, 1 when doing a product. The other is to compress
the data - that is, remove all masked values.
for example:
In [997]: data=np.arange(-5,10)In [998]: mdata=np.ma.masked_where(data<0,data)In [1001]: np.ma.filled(mdata,0)Out[1001]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])In [1002]: np.ma.filled(mdata,1)Out[1002]: array([1, 1, 1, 1, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])In [1008]: mdata.compressed()Out[1008]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Which is going to give you the desired percentile
? Filling or compressing? Or none. You need to understand the concept of percentile well enough to know how it should apply in the case of your masked values.