How to remove outliers from a dataset How to remove outliers from a dataset r r

How to remove outliers from a dataset


Nobody has posted the simplest answer:

x[!x %in% boxplot.stats(x)$out]

Also see this: http://www.r-statistics.com/2011/01/how-to-label-all-the-outliers-in-a-boxplot/


OK, you should apply something like this to your dataset. Do not replace & save or you'll destroy your data! And, btw, you should (almost) never remove outliers from your data:

remove_outliers <- function(x, na.rm = TRUE, ...) {  qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)  H <- 1.5 * IQR(x, na.rm = na.rm)  y <- x  y[x < (qnt[1] - H)] <- NA  y[x > (qnt[2] + H)] <- NA  y}

To see it in action:

set.seed(1)x <- rnorm(100)x <- c(-10, x, 10)y <- remove_outliers(x)## png()par(mfrow = c(1, 2))boxplot(x)boxplot(y)## dev.off()

And once again, you should never do this on your own, outliers are just meant to be! =)

EDIT: I added na.rm = TRUE as default.

EDIT2: Removed quantile function, added subscripting, hence made the function faster! =)

enter image description here


Use outline = FALSE as an option when you do the boxplot (read the help!).

> m <- c(rnorm(10),5,10)> bp <- boxplot(m, outline = FALSE)

enter image description here