How to select and delete columns with duplicate name in pandas DataFrame

python pandas dataframe duplicates multiple-columns

You can adress columns by index:

>>> df = pd.DataFrame([[1,2],[3,4],[5,6]], columns=['a','a'])>>> df   a  a0  1  21  3  42  5  6>>> df.iloc[:,0]0    11    32    5

Or you can rename columns, like

>>> df.columns = ['a','b']>>> df   a  b0  1  21  3  42  5  6

python pandas dataframe duplicates multiple-columns

Another solution:

def remove_dup_columns(frame):     keep_names = set()     keep_icols = list()     for icol, name in enumerate(frame.columns):          if name not in keep_names:               keep_names.add(name)               keep_icols.append(icol)     return frame.iloc[:, keep_icols]import numpy as npimport pandas as pdframe = pd.DataFrame(np.random.randint(0, 50, (5, 4)), columns=['A', 'A', 'B', 'B'])print(frame)print(remove_dup_columns(frame))

The output is

    A   A   B   B0  18  44  13  471  41  19  35  282  49   0  30  163  39  29  43  414  26  19  48  13    A   B0  18  131  41  352  49  303  39  434  26  48

python pandas dataframe duplicates multiple-columns

This is not a good situation to be in. Best would be to create a hierarchical column labeling scheme (Pandas allows for multi-level column labeling or row index labels). Determine what it is that makes the two different columns that have the same name actually different from each other and leverage that to create a hierarchical column index.

In the mean time, if you know the positional location of the columns in the ordered list of columns (e.g. from dataframe.columns) then you can use many of the explicit indexing features, such as .ix[], or .iloc[] to retrieve values from the column positionally.

You can also create copies of the columns with new names, such as:

dataframe["new_name"] = data_frame.ix[:, column_position].values

where column_position references the positional location of the column you're trying to get (not the name).

These may not work for you if the data is too large, however. So best is to find a way to modify the construction process to get the hierarchical column index.

CodeHunter

How to select and delete columns with duplicate name in pandas DataFrame

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last