Remove duplicated rows using dplyr
Note: dplyr
now contains the distinct
function for this purpose.
Original answer below:
library(dplyr)set.seed(123)df <- data.frame( x = sample(0:1, 10, replace = T), y = sample(0:1, 10, replace = T), z = 1:10)
One approach would be to group, and then only keep the first row:
df %>% group_by(x, y) %>% filter(row_number(z) == 1)## Source: local data frame [3 x 3]## Groups: x, y## ## x y z## 1 0 1 1## 2 1 0 2## 3 1 1 4
(In dplyr 0.2 you won't need the dummy z
variable and will just beable to write row_number() == 1
)
I've also been thinking about adding a slice()
function that wouldwork like:
df %>% group_by(x, y) %>% slice(from = 1, to = 1)
Or maybe a variation of unique()
that would let you select whichvariables to use:
df %>% unique(x, y)