Weighted percentile using numpy Weighted percentile using numpy python python

Weighted percentile using numpy


Completely vectorized numpy solution

Here is the code I use. It's not an optimal one (which I'm unable to write with numpy), but still much faster and more reliable than accepted solution

def weighted_quantile(values, quantiles, sample_weight=None,                       values_sorted=False, old_style=False):    """ Very close to numpy.percentile, but supports weights.    NOTE: quantiles should be in [0, 1]!    :param values: numpy.array with data    :param quantiles: array-like with many quantiles needed    :param sample_weight: array-like of the same length as `array`    :param values_sorted: bool, if True, then will avoid sorting of        initial array    :param old_style: if True, will correct output to be consistent        with numpy.percentile.    :return: numpy.array with computed quantiles.    """    values = np.array(values)    quantiles = np.array(quantiles)    if sample_weight is None:        sample_weight = np.ones(len(values))    sample_weight = np.array(sample_weight)    assert np.all(quantiles >= 0) and np.all(quantiles <= 1), \        'quantiles should be in [0, 1]'    if not values_sorted:        sorter = np.argsort(values)        values = values[sorter]        sample_weight = sample_weight[sorter]    weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight    if old_style:        # To be convenient with numpy.percentile        weighted_quantiles -= weighted_quantiles[0]        weighted_quantiles /= weighted_quantiles[-1]    else:        weighted_quantiles /= np.sum(sample_weight)    return np.interp(quantiles, weighted_quantiles, values)

Examples:

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.])

array([ 1. , 3.2, 9. ])

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.], sample_weight=[2, 1, 2, 4, 1])

array([ 1. , 3.2, 9. ])


A quick solution, by first sorting and then interpolating:

def weighted_percentile(data, percents, weights=None):    ''' percents in units of 1%        weights specifies the frequency (count) of data.    '''    if weights is None:        return np.percentile(data, percents)    ind=np.argsort(data)    d=data[ind]    w=weights[ind]    p=1.*w.cumsum()/w.sum()*100    y=np.interp(percents, p, d)    return y


I don' know what's Weighted percentile means, but from @Joan Smith's answer, It seems that you just need to repeat every element in ar, you can use numpy.repeat():

import numpy as npnp.repeat([1,2,3], [4,5,6])

the result is:

array([1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3])