Weighted percentile using numpy
Completely vectorized numpy solution
Here is the code I use. It's not an optimal one (which I'm unable to write with numpy
), but still much faster and more reliable than accepted solution
def weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False): """ Very close to numpy.percentile, but supports weights. NOTE: quantiles should be in [0, 1]! :param values: numpy.array with data :param quantiles: array-like with many quantiles needed :param sample_weight: array-like of the same length as `array` :param values_sorted: bool, if True, then will avoid sorting of initial array :param old_style: if True, will correct output to be consistent with numpy.percentile. :return: numpy.array with computed quantiles. """ values = np.array(values) quantiles = np.array(quantiles) if sample_weight is None: sample_weight = np.ones(len(values)) sample_weight = np.array(sample_weight) assert np.all(quantiles >= 0) and np.all(quantiles <= 1), \ 'quantiles should be in [0, 1]' if not values_sorted: sorter = np.argsort(values) values = values[sorter] sample_weight = sample_weight[sorter] weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight if old_style: # To be convenient with numpy.percentile weighted_quantiles -= weighted_quantiles[0] weighted_quantiles /= weighted_quantiles[-1] else: weighted_quantiles /= np.sum(sample_weight) return np.interp(quantiles, weighted_quantiles, values)
Examples:
weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.])
array([ 1. , 3.2, 9. ])
weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.], sample_weight=[2, 1, 2, 4, 1])
array([ 1. , 3.2, 9. ])
A quick solution, by first sorting and then interpolating:
def weighted_percentile(data, percents, weights=None): ''' percents in units of 1% weights specifies the frequency (count) of data. ''' if weights is None: return np.percentile(data, percents) ind=np.argsort(data) d=data[ind] w=weights[ind] p=1.*w.cumsum()/w.sum()*100 y=np.interp(percents, p, d) return y
I don' know what's Weighted percentile means, but from @Joan Smith's answer, It seems that you just need to repeat every element in ar
, you can use numpy.repeat()
:
import numpy as npnp.repeat([1,2,3], [4,5,6])
the result is:
array([1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3])