Pass percentiles to pandas agg function Pass percentiles to pandas agg function python python

Pass percentiles to pandas agg function


Perhaps not super efficient, but one way would be to create a function yourself:

def percentile(n):    def percentile_(x):        return np.percentile(x, n)    percentile_.__name__ = 'percentile_%s' % n    return percentile_

Then include this in your agg:

In [11]: column.agg([np.sum, np.mean, np.std, np.median,                     np.var, np.min, np.max, percentile(50), percentile(95)])Out[11]:           sum       mean        std  median          var  amin  amax  percentile_50  percentile_95AGGREGATEA          106  35.333333  42.158431      12  1777.333333    10    84             12           76.8B           36  12.000000   8.888194       9    79.000000     5    22             12           76.8

Note sure this is how it should be done though...


You can have agg() use a custom function to be executed on specified column:

# 50th Percentiledef q50(x):    return x.quantile(0.5)# 90th Percentiledef q90(x):    return x.quantile(0.9)my_DataFrame.groupby(['AGGREGATE']).agg({'MY_COLUMN': [q50, q90, 'max']})


Being more specific, if you just want to aggregate your pandas groupby results using the percentile function, the python lambda function offers a pretty neat solution. Using the question's notation, aggregating by the percentile 95, should be:

dataframe.groupby('AGGREGATE').agg(lambda x: np.percentile(x['COL'], q = 95))

You can also assign this function to a variable and use it in conjunction with other aggregation functions.