Understanding dates and plotting a histogram with ggplot2 in R Understanding dates and plotting a histogram with ggplot2 in R r r

Understanding dates and plotting a histogram with ggplot2 in R


UPDATE

Version 2: Using Date class

I update the example to demonstrate aligning the labels and setting limits on the plot. I also demonstrate that as.Date does indeed work when used consistently (actually it is probably a better fit for your data than my earlier example).

The Target Plot v2

date-based histogram

The Code v2

And here is (somewhat excessively) commented code:

library("ggplot2")library("scales")dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)dates$Date <- as.Date(dates$Date)# convert the Date to its numeric equivalent# Note that Dates are stored as number of days internally,# hence it is easy to convert back and forth mentallydates$num <- as.numeric(dates$Date)bin <- 60 # used for aggregating the data and aligning the labelsp <- ggplot(dates, aes(num, ..count..))p <- p + geom_histogram(binwidth = bin, colour="white")# The numeric data is treated as a date,# breaks are set to an interval equal to the binwidth,# and a set of labels is generated and adjusted in order to align with barsp <- p + scale_x_date(breaks = seq(min(dates$num)-20, # change -20 term to taste                                   max(dates$num),                                    bin),                      labels = date_format("%Y-%b"),                      limits = c(as.Date("2009-01-01"),                                  as.Date("2011-12-01")))# from here, format at easep <- p + theme_bw() + xlab(NULL) + opts(axis.text.x  = theme_text(angle=45,                                                                  hjust = 1,                                                                  vjust = 1))p

Version 1: Using POSIXct

I try a solution that does everything in ggplot2, drawing without the aggregation, and setting the limits on the x-axis between the beginning of 2009 and the end of 2011.

The Target Plot v1

plot with limits set in ggplot2

The Code v1

library("ggplot2")library("scales")dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)dates$Date <- as.POSIXct(dates$Date)p <- ggplot(dates, aes(Date, ..count..)) +     geom_histogram() +    theme_bw() + xlab(NULL) +    scale_x_datetime(breaks = date_breaks("3 months"),                     labels = date_format("%Y-%b"),                     limits = c(as.POSIXct("2009-01-01"),                                 as.POSIXct("2011-12-01")) )p

Of course, it could do with playing with the label options on the axis, but this is to round off the plotting with a clean short routine in the plotting package.


I think the key thing is that you need to do the frequency calculation outside of ggplot. Use aggregate() with geom_bar(stat="identity") to get a histogram without the reordered factors. Here is some example code:

require(ggplot2)# scales goes with ggplot and adds the needed scale* functionsrequire(scales)# need the month() function for the extra plotrequire(lubridate)# original data#df<-read.csv("http://pastebin.com/download.php?i=sDzXKFxJ", header=TRUE)# simulated datayears=sample(seq(2008,2012),681,replace=TRUE,prob=c(0.0176211453744493,0.302496328928047,0.323054331864905,0.237885462555066,0.118942731277533))months=sample(seq(1,12),681,replace=TRUE)my.dates=as.Date(paste(years,months,01,sep="-"))df=data.frame(YM=strftime(my.dates, format="%Y-%b"),Date=my.dates,Year=years,Month=months)# end simulated data creation# sort the list just to make it pretty. It makes no difference in the final resultsdf=df[do.call(order, df[c("Date")]), ]# add a dummy column for clarity in processingdf$Count=1# compute the frequencies ourselvesfreqs=aggregate(Count ~ Year + Month, data=df, FUN=length)# rebuild the Date column so that ggplot worksfreqs$Date=as.Date(paste(freqs$Year,freqs$Month,"01",sep="-"))# I set the breaks for 2 months to reduce clutterg<-ggplot(data=freqs,aes(x=Date,y=Count))+ geom_bar(stat="identity") + scale_x_date(labels=date_format("%Y-%b"),breaks="2 months") + theme_bw() + opts(axis.text.x = theme_text(angle=90))print(g)# don't overwrite the previous graphdev.new()# just for grins, here is a faceted view by year# Add the Month.name factor to have things work. month() keeps the factor levels in orderfreqs$Month.name=month(freqs$Date,label=TRUE, abbr=TRUE)g2<-ggplot(data=freqs,aes(x=Month.name,y=Count))+ geom_bar(stat="identity") + facet_grid(Year~.) + theme_bw()print(g2)


I know this is an old question, but for anybody coming to this in 2021 (or later), this can be done much easier using the breaks= argument for geom_histogram() and creating a little shortcut function to make the required sequence.

dates <- read.csv("http://pastebin.com/raw.php?i=sDzXKFxJ", sep=",", header=T)dates$Date <- lubridate::ymd(dates$Date)by_month <- function(x,n=1){  seq(min(x,na.rm=T),max(x,na.rm=T),by=paste0(n," months"))}ggplot(dates,aes(Date)) +  geom_histogram(breaks = by_month(dates$Date)) +  scale_x_date(labels = scales::date_format("%Y-%b"),               breaks = by_month(dates$Date,2)) +   theme(axis.text.x = element_text(angle=90))

histogram