pandas read_csv and filter columns with usecols

The solution lies in understanding these two keyword arguments:

names is only necessary when there is no header row in your file and you want to specify other arguments (such as usecols) using column names rather than integer indices.
usecols is supposed to provide a filter before reading the whole DataFrame into memory; if used properly, there should never be a need to delete columns after reading.

So because you have a header row, passing header=0 is sufficient and additionally passing names appears to be confusing pd.read_csv.

Removing names from the second call gives the desired output:

import pandas as pdfrom StringIO import StringIOcsv = r"""dummy,date,loc,xbar,20090101,a,1bar,20090102,a,3bar,20090103,a,5bar,20090101,b,1bar,20090102,b,3bar,20090103,b,5"""df = pd.read_csv(StringIO(csv),        header=0,        index_col=["date", "loc"],         usecols=["date", "loc", "x"],        parse_dates=["date"])

Which gives us:

                xdate       loc2009-01-01 a    12009-01-02 a    32009-01-03 a    52009-01-01 b    12009-01-02 b    32009-01-03 b    5

python pandas csv csv-import

This code achieves what you want --- also its weird and certainly buggy:

I observed that it works when:

a) you specify the index_col rel. to the number of columns you really use -- so its three columns in this example, not four (you drop dummy and start counting from then onwards)

b) same for parse_dates

c) not so for usecols ;) for obvious reasons

d) here I adapted the names to mirror this behaviour

import pandas as pdfrom StringIO import StringIOcsv = """dummy,date,loc,xbar,20090101,a,1bar,20090102,a,3bar,20090103,a,5bar,20090101,b,1bar,20090102,b,3bar,20090103,b,5"""df = pd.read_csv(StringIO(csv),        index_col=[0,1],        usecols=[1,2,3],         parse_dates=[0],        header=0,        names=["date", "loc", "", "x"])print df

which prints

                xdate       loc   2009-01-01 a    12009-01-02 a    32009-01-03 a    52009-01-01 b    12009-01-02 b    32009-01-03 b    5

python pandas csv csv-import

If your csv file contains extra data, columns can be deleted from the DataFrame after import.

import pandas as pdfrom StringIO import StringIOcsv = r"""dummy,date,loc,xbar,20090101,a,1bar,20090102,a,3bar,20090103,a,5bar,20090101,b,1bar,20090102,b,3bar,20090103,b,5"""df = pd.read_csv(StringIO(csv),        index_col=["date", "loc"],         usecols=["dummy", "date", "loc", "x"],        parse_dates=["date"],        header=0,        names=["dummy", "date", "loc", "x"])del df['dummy']

Which gives us:

                xdate       loc2009-01-01 a    12009-01-02 a    32009-01-03 a    52009-01-01 b    12009-01-02 b    32009-01-03 b    5

CodeHunter

pandas read_csv and filter columns with usecols

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last