Eliminating all data over a given percentile

Use the Series.quantile() method:

In [48]: cols = list('abc')In [49]: df = DataFrame(randn(10, len(cols)), columns=cols)In [50]: df.a.quantile(0.95)Out[50]: 1.5776961953820687

To filter out rows of df where df.a is greater than or equal to the 95th percentile do:

In [72]: df[df.a < df.a.quantile(.95)]Out[72]:       a      b      c0 -1.044 -0.247 -1.1492  0.395  0.591  0.7643 -0.564 -2.059  0.2324 -0.707 -0.736 -1.3455  0.978 -0.099  0.5216 -0.974  0.272 -0.6497  1.228  0.619 -0.8498 -0.170  0.458 -0.5159  1.465  1.019  0.966

python pandas filtering percentile

numpy is much faster than Pandas for this kind of things :

numpy.percentile(df.a,95) # attention : the percentile is given in percent (5 = 5%)

is equivalent but 3 times faster than :

df.a.quantile(.95)  # as you already noticed here it is ".95" not "95"

so for your code, it gives :

df[df.a < np.percentile(df.a,95)]

python pandas filtering percentile

You can use query for a more concise option:

df.query('ms < ms.quantile(.95)')

CodeHunter

Eliminating all data over a given percentile

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last