Detect and exclude outliers in Pandas data frame

python pandas filtering dataframe outliers

If you have multiple columns in your dataframe and would like to remove all rows that have outliers in at least one column, the following expression would do that in one shot.

df = pd.DataFrame(np.random.randn(100, 3))from scipy import statsdf[(np.abs(stats.zscore(df)) < 3).all(axis=1)]

description:

For each column, it first computes the Z-score of each value in thecolumn, relative to the column mean and standard deviation.
It then takes the absolute Z-score because the direction does notmatter, only if it is below the threshold.
all(axis=1) ensures that for each row, all column satisfy theconstraint.
Finally, the result of this condition is used to index the dataframe.

Filter other columns based on a single column

Specify a column for the zscore, df[0] for example, and remove .all(axis=1).

df[(np.abs(stats.zscore(df[0])) < 3)]

python pandas filtering dataframe outliers

Use boolean indexing as you would do in numpy.array

df = pd.DataFrame({'Data':np.random.normal(size=200)})# example dataset of normally distributed data. df[np.abs(df.Data-df.Data.mean()) <= (3*df.Data.std())]# keep only the ones that are within +3 to -3 standard deviations in the column 'Data'.df[~(np.abs(df.Data-df.Data.mean()) > (3*df.Data.std()))]# or if you prefer the other way around

For a series it is similar:

S = pd.Series(np.random.normal(size=200))S[~((S-S.mean()).abs() > 3*S.std())]

python pandas filtering dataframe outliers

For each of your dataframe column, you could get quantile with:

q = df["col"].quantile(0.99)

and then filter with:

df[df["col"] < q]

If one need to remove lower and upper outliers, combine condition with an AND statement:

q_low = df["col"].quantile(0.01)q_hi  = df["col"].quantile(0.99)df_filtered = df[(df["col"] < q_hi) & (df["col"] > q_low)]

CodeHunter

Detect and exclude outliers in Pandas data frame

Filter other columns based on a single column

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last