Conditional removing of duplicates pandas python

python python-2.7 numpy pandas dataframe

Use drop_duplicates to return dataframe with duplicate rows removed, optionally only considering certain columns

Let initial dataframe be like

In [34]: dfOut[34]:  Col1 Col2  Col30    A    B    101    A    B    202    A    C    203    C    B    204    A    B    20

If you want to take unique combinations from certain columns 'Col1', 'Col2'

In [35]: df.drop_duplicates(['Col1', 'Col2'])Out[35]:  Col1 Col2  Col30    A    B    102    A    C    203    C    B    20

If you want to take unique combinations of all columns

In [36]: df.drop_duplicates()Out[36]:  Col1 Col2  Col30    A    B    101    A    B    202    A    C    203    C    B    20

CodeHunter