ggplot2 Color Scale Over Affected by Outliers ggplot2 Color Scale Over Affected by Outliers r r

ggplot2 Color Scale Over Affected by Outliers


Here's one slightly tricky options:

#Create a new variable indicating the unusual valuesx$Length1 <- "> 1500"x$Length1[x$Length <= 1500] <- NA#main plot# Using fill - tricky!g <- ggplot() +  geom_point(data = subset(x,Length <= 1500),             aes(x=date,y=factor(stateabbr),color=Length),size=4) +   geom_point(data = subset(x,Length > 1500),             aes(x=date,y=factor(stateabbr),fill=Length1),size=4)+  opts(title="Date and State") + xlab("Date") + ylab("State")#problemg + scale_color_gradient2("Length",midpoint=median(x$Length))

enter image description here

So the tricky part here is using fill on points, in order to convince ggplot to make another legend. You can obviously customize this with different labels and colors for the fill scale.

One more thing, reading Brandon's answer. You could in principle combine both approaches by taking the outlying values, using cut to create a separate categorical variable for them, and then use my trick with the fill scale. That way you could indicate multiple outlying groups of points.


From my comment, see ?cut

x$colors <- cut(x$Length, breaks=c(0,500,1000,1300,max(x$Length)))g <- ggplot(data=x,aes(x=date,y=factor(stateabbr),color=colors)) +    geom_point() +     opts(title="Date and State") +     xlab("Date") +     ylab("State")


Get rid of the outliers. Quick and dirty, I know, but I think it was worth saying. You can always describe them in your text. Why let them ruin your analyses and graphs?

There's a paper referenced in this blog post which deals with ethically removing outliers:

http://psuc2f.wordpress.com/2011/10/14/is-it-dishonest-or-unethical-to-remove-outliers/

Another simple way of dealing with them would be to cap them:

df$Value[df$Value>1300]=1300

Again, you can describe that you did this in the text or even just edit the scale to say 1300+ instead of 1300