How can I run a numpy function percentile() on a masked array? How can I run a numpy function percentile() on a masked array? numpy numpy

How can I run a numpy function percentile() on a masked array?


If you fill your masked values as np.nan, you could then use np.nanpercentile

import numpy as npdata = np.arange(-5.5,10.5) # Note that you need a non-integer array to store NaNmdata = np.ma.masked_where(data < 0, data)mdata = np.ma.filled(mdata, np.nan)np.nanpercentile(mdata, 50) # 50th percentile


Looking at the np.percentile code it is clear it does nothing special with masked arrays.

def percentile(a, q, axis=None, out=None,               overwrite_input=False, interpolation='linear', keepdims=False):    q = array(q, dtype=np.float64, copy=True)    r, k = _ureduce(a, func=_percentile, q=q, axis=axis, out=out,                    overwrite_input=overwrite_input,                    interpolation=interpolation)    if keepdims:        if q.ndim == 0:            return r.reshape(k)        else:            return r.reshape([len(q)] + k)    else:        return r

Where _ureduce and _percentile are internal functions defined in numpy/lib/function_base.py. So the real action is more complex.

Masked arrays have 2 strategies for using numpy functions. One is to fill - replace the masked values with innocuous ones, for example 0 when doing sum, 1 when doing a product. The other is to compress the data - that is, remove all masked values.

for example:

In [997]: data=np.arange(-5,10)In [998]: mdata=np.ma.masked_where(data<0,data)In [1001]: np.ma.filled(mdata,0)Out[1001]: array([0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])In [1002]: np.ma.filled(mdata,1)Out[1002]: array([1, 1, 1, 1, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])In [1008]: mdata.compressed()Out[1008]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Which is going to give you the desired percentile? Filling or compressing? Or none. You need to understand the concept of percentile well enough to know how it should apply in the case of your masked values.