Drop all duplicate rows across multiple columns in Python Pandas

python pandas duplicates drop-duplicates

This is much easier in pandas now with drop_duplicates and the keep parameter.

import pandas as pddf = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})df.drop_duplicates(subset=['A', 'C'], keep=False)

python pandas duplicates drop-duplicates

Just want to add to Ben's answer on drop_duplicates:

keep : {‘first’, ‘last’, False}, default ‘first’

first : Drop duplicates except for the first occurrence.
last : Drop duplicates except for the last occurrence.
False : Drop all duplicates.

So setting keep to False will give you desired answer.

DataFrame.drop_duplicates(*args, **kwargs) Return DataFrame with duplicate rows removed, optionally only considering certain columns
Parameters: subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns keep : {‘first’, ‘last’, False}, default ‘first’ first : Drop duplicates except for the first occurrence. last : Drop duplicates except for the last occurrence. False : Drop all duplicates. take_last : deprecated inplace : boolean, default False Whether to drop duplicates in place or to return a copy cols : kwargs only argument of subset [deprecated] Returns: deduplicated : DataFrame

python pandas duplicates drop-duplicates

If you want result to be stored in another dataset:

df.drop_duplicates(keep=False)

df.drop_duplicates(keep=False, inplace=False)

If same dataset needs to be updated:

df.drop_duplicates(keep=False, inplace=True)

Above examples will remove all duplicates and keep one, similar to DISTINCT * in SQL

CodeHunter

Drop all duplicate rows across multiple columns in Python Pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last