Getting frequency values from histogram in R Getting frequency values from histogram in R r r

Getting frequency values from histogram in R


The hist function has a return value (an object of class histogram):

R> res <- hist(rnorm(100))R> res$breaks[1] -4 -3 -2 -1  0  1  2  3  4$counts[1]  1  2 17 27 34 16  2  1$intensities[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01$density[1] 0.01 0.02 0.17 0.27 0.34 0.16 0.02 0.01$mids[1] -3.5 -2.5 -1.5 -0.5  0.5  1.5  2.5  3.5$xname[1] "rnorm(100)"$equidist[1] TRUEattr(,"class")[1] "histogram"


From ?hist:Value

an object of class "histogram" which is a list with components:

  • breaks the n+1 cell boundaries (= breaks if that was a vector).These are the nominal breaks, not with the boundary fuzz.
  • counts n integers; for each cell, the number of x[] inside.
  • density values f^(x[i]), as estimated density values. Ifall(diff(breaks) == 1), they are the relative frequencies counts/nand in general satisfy sum[i; f^(x[i]) (b[i+1]-b[i])] = 1, where b[i]= breaks[i].
  • intensities same as density. Deprecated, but retained forcompatibility.
  • mids the n cell midpoints.
  • xname a character string with the actual x argument name.
  • equidist logical, indicating if the distances between breaks are allthe same.

breaks and density provide just about all you need:

histrv<-hist(x)histrv$breakshistrv$density


Just in case someone hits this question with ggplot's geom_histogram in mind, note that there is a way to extract the data from a ggplot object.

The following convenience function outputs a dataframe with the lower limit of each bin (xmin), the upper limit of each bin (xmax), the mid-point of each bin (x), as well as the frequency value (y).

## Convenience functionget_hist <- function(p) {    d <- ggplot_build(p)$data[[1]]    data.frame(x = d$x, xmin = d$xmin, xmax = d$xmax, y = d$y)}# make a dataframe for ggplotset.seed(1)x = runif(100, 0, 10)y = cumsum(x)df <- data.frame(x = sort(x), y = y)# make geom_histogram p <- ggplot(data = df, aes(x = x)) +     geom_histogram(aes(y = cumsum(..count..)), binwidth = 1, boundary = 0,                color = "black", fill = "white")

Illustration:

hist = get_hist(p)head(hist$x)## [1] 0.5 1.5 2.5 3.5 4.5 5.5head(hist$y)## [1]  7 13 24 38 52 57head(hist$xmax)## [1] 1 2 3 4 5 6head(hist$xmin)## [1] 0 1 2 3 4 5

A related question I answered here (Cumulative histogram with ggplot2).