Remove duplicated rows using dplyr

r dplyr

Here is a solution using dplyr >= 0.5.

library(dplyr)set.seed(123)df <- data.frame(  x = sample(0:1, 10, replace = T),  y = sample(0:1, 10, replace = T),  z = 1:10)> df %>% distinct(x, y, .keep_all = TRUE)    x y z  1 0 1 1  2 1 0 2  3 1 1 4

r dplyr

Note: dplyr now contains the distinct function for this purpose.

Original answer below:

library(dplyr)set.seed(123)df <- data.frame(  x = sample(0:1, 10, replace = T),  y = sample(0:1, 10, replace = T),  z = 1:10)

One approach would be to group, and then only keep the first row:

df %>% group_by(x, y) %>% filter(row_number(z) == 1)## Source: local data frame [3 x 3]## Groups: x, y## ##   x y z## 1 0 1 1## 2 1 0 2## 3 1 1 4

(In dplyr 0.2 you won't need the dummy z variable and will just beable to write row_number() == 1)

I've also been thinking about adding a slice() function that wouldwork like:

df %>% group_by(x, y) %>% slice(from = 1, to = 1)

Or maybe a variation of unique() that would let you select whichvariables to use:

df %>% unique(x, y)

r dplyr

For completeness’ sake, the following also works:

df %>% group_by(x) %>% filter (! duplicated(y))

However, I prefer the solution using distinct, and I suspect it’s faster, too.

CodeHunter

Remove duplicated rows using dplyr

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last