Relative frequencies / proportions with dplyr
Try this:
mtcars %>% group_by(am, gear) %>% summarise(n = n()) %>% mutate(freq = n / sum(n))# am gear n freq# 1 0 3 15 0.7894737# 2 0 4 4 0.2105263# 3 1 4 8 0.6153846# 4 1 5 5 0.3846154
From the dplyr vignette:
When you group by multiple variables, each summary peels off one level of the grouping. That makes it easy to progressively roll-up a dataset.
Thus, after the summarise
, the last grouping variable specified in group_by
, 'gear', is peeled off. In the mutate
step, the data is grouped by the remaining grouping variable(s), here 'am'. You may check grouping in each step with groups
.
The outcome of the peeling is of course dependent of the order of the grouping variables in the group_by
call. You may wish to do a subsequent group_by(am)
, to make your code more explicit.
For rounding and prettification, please refer to the nice answer by @Tyler Rinker.
You can use count()
function, which has however a different behaviour depending on the version of dplyr
:
dplyr 0.7.1: returns an ungrouped table: you need to group again by
am
dplyr < 0.7.1: returns a grouped table, so no need to group again, although you might want to
ungroup()
for later manipulations
dplyr 0.7.1
mtcars %>% count(am, gear) %>% group_by(am) %>% mutate(freq = n / sum(n))
dplyr < 0.7.1
mtcars %>% count(am, gear) %>% mutate(freq = n / sum(n))
This results into a grouped table, if you want to use it for further analysis, it might be useful to remove the grouped attribute with ungroup()
.
@Henrik's is better for usability as this will make the column character and no longer numeric but matches what you asked for...
mtcars %>% group_by (am, gear) %>% summarise (n=n()) %>% mutate(rel.freq = paste0(round(100 * n/sum(n), 0), "%"))## am gear n rel.freq## 1 0 3 15 79%## 2 0 4 4 21%## 3 1 4 8 62%## 4 1 5 5 38%
EDIT Because Spacedman asked for it :-)
as.rel_freq <- function(x, rel_freq_col = "rel.freq", ...) { class(x) <- c("rel_freq", class(x)) attributes(x)[["rel_freq_col"]] <- rel_freq_col x}print.rel_freq <- function(x, ...) { freq_col <- attributes(x)[["rel_freq_col"]] x[[freq_col]] <- paste0(round(100 * x[[freq_col]], 0), "%") class(x) <- class(x)[!class(x)%in% "rel_freq"] print(x)}mtcars %>% group_by (am, gear) %>% summarise (n=n()) %>% mutate(rel.freq = n/sum(n)) %>% as.rel_freq()## Source: local data frame [4 x 4]## Groups: am## ## am gear n rel.freq## 1 0 3 15 79%## 2 0 4 4 21%## 3 1 4 8 62%## 4 1 5 5 38%