Can pandas automatically read dates from a CSV file? Can pandas automatically read dates from a CSV file? pandas pandas

Can pandas automatically read dates from a CSV file?


You should add parse_dates=True, or parse_dates=['column name'] when reading, thats usually enough to magically parse it. But there are always weird formats which need to be defined manually. In such a case you can also add a date parser function, which is the most flexible way possible.

Suppose you have a column 'datetime' with your string, then:

from datetime import datetimedateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')df = pd.read_csv(infile, parse_dates=['datetime'], date_parser=dateparse)

This way you can even combine multiple columns into a single datetime column, this merges a 'date' and a 'time' column into a single 'datetime' column:

dateparse = lambda x: datetime.strptime(x, '%Y-%m-%d %H:%M:%S')df = pd.read_csv(infile, parse_dates={'datetime': ['date', 'time']}, date_parser=dateparse)

You can find directives (i.e. the letters to be used for different formats) for strptime and strftime in this page.


Perhaps the pandas interface has changed since @Rutger answered, but in the version I'm using (0.15.2), the date_parser function receives a list of dates instead of a single value. In this case, his code should be updated like so:

from datetime import datetimeimport pandas as pddateparse = lambda dates: [datetime.strptime(d, '%Y-%m-%d %H:%M:%S') for d in dates]    df = pd.read_csv('test.dat', parse_dates=['datetime'], date_parser=dateparse)

Since the original question asker said he wants dates and the dates are in 2013-6-4 format, the dateparse function should really be:

dateparse = lambda dates: [datetime.strptime(d, '%Y-%m-%d').date() for d in dates]


You could use pandas.to_datetime() as recommended in the documentation for pandas.read_csv():

If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. For non-standard datetime parsing, use pd.to_datetime after pd.read_csv.

Demo:

>>> D = {'date': '2013-6-4'}>>> df = pd.DataFrame(D, index=[0])>>> df       date0  2013-6-4>>> df.dtypesdate    objectdtype: object>>> df['date'] = pd.to_datetime(df.date, format='%Y-%m-%d')>>> df        date0 2013-06-04>>> df.dtypesdate    datetime64[ns]dtype: object