Pandas read_csv: parsing time field correctly Pandas read_csv: parsing time field correctly pandas pandas

Pandas read_csv: parsing time field correctly


In Python, datetimes are generally represented as datetime.datetime objects. These are not very efficient, which is why Pandas uses Timestamps, which are numeric.

To read the data (note the double brackets around the parse_dates arguments):

df = pd.read_csv(filename, parse_dates=[['TranDate', 'TranTime']])>>> df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 18 entries, 0 to 17Data columns (total 3 columns):TranDate_TranTime    18 non-null datetime64[ns]TranID               18 non-null objectTranAmt              18 non-null float64dtypes: datetime64[ns](1), float64(1), object(1)>>> df.head()    TranDate_TranTime   TranID    TranAmt0 2016-04-27 02:18:00  A123456    9999.531 2016-04-26 02:48:00  B123457   26070.332 2016-04-25 03:18:00  C123458   13779.563 2016-04-24 03:18:00  A123459   18157.264 2016-04-23 04:18:00  B123460  215868.15

The date and time columns have been joined to just one columns. Once you have this timestamp, it is easy to access its attributes using the dt accessor, e.g.

>>> df.groupby(df.TranDate_TranTime.dt.hour).TranAmt.sum().head()TranDate_TranTime2     36069.863     31936.824    239563.405     64651.246     62010.65Name: TranAmt, dtype: float64>>> df.groupby(df.TranDate_TranTime.dt.day).TranAmt.sum().head()TranDate_TranTime19     68939.6320    262215.4221     15929.4522     36902.2323    242122.84Name: TranAmt, dtype: float64

Refer to the Pandas docs for more information regarding Pandas date functionality.


  1. No aggregation affection, but you will lose the time part.
  2. No, mostly you can access the time part by .dt accessor.
import pandas as pddf = pd.read_csv('MyTest.csv', parse_dates=[['TranDate', 'TranTime']])print dfTranDate_TranTime   TranID    TranAmt0  2016-04-27 02:18:00  A123456    9999.531  2016-04-26 02:48:00  B123457   26070.332  2016-04-25 03:18:00  C123458   13779.563  2016-04-24 03:18:00  A123459   18157.264  2016-04-23 04:18:00  B123460  215868.155  2016-04-22 04:18:00  C123461   23695.256  2016-04-21 05:18:00  A123462      57.007  2016-04-20 05:18:00  B123463   64594.248  2016-04-19 06:18:00  C123464   47890.919  2016-04-27 06:18:00  A123465   14119.7410 2016-04-26 07:18:00  B123466    2649.6011 2016-04-25 07:18:00  C123467   16757.3812 2016-04-24 08:18:00  A123468    8864.7813 2016-04-23 08:18:00  B123469   26254.6914 2016-04-22 09:18:00  C123470   13206.9815 2016-04-21 09:18:00  A123471   15872.4516 2016-04-20 10:18:00  B123472  197621.1817 2016-04-19 10:18:00  C123473   21048.72

Parse and manage the date/time as one column as far as possible using nested bracket parse_dates=[[]].

print df.groupby(df.TranDate_TranTime.dt.hour).sum()      TranAmt2    36069.863    31936.824   239563.405    64651.246    62010.657    19406.988    35119.479    29079.4310  218669.90print df.groupby(df.TranDate_TranTime.dt.minute).sum()      TranAmt18  710437.4248   26070.33

Get what you want like above.

And you can still groupby after resampling like below.

df2 = df.set_index('TranDate_TranTime').resample('60s').sum().dropna()print df2                       TranAmtTranDate_TranTime             2016-04-19 06:18:00   47890.912016-04-19 10:18:00   21048.722016-04-20 05:18:00   64594.242016-04-20 10:18:00  197621.182016-04-21 05:18:00      57.002016-04-21 09:18:00   15872.452016-04-22 04:18:00   23695.252016-04-22 09:18:00   13206.982016-04-23 04:18:00  215868.152016-04-23 08:18:00   26254.692016-04-24 03:18:00   18157.262016-04-24 08:18:00    8864.782016-04-25 03:18:00   13779.562016-04-25 07:18:00   16757.382016-04-26 02:48:00   26070.332016-04-26 07:18:00    2649.602016-04-27 02:18:00    9999.532016-04-27 06:18:00   14119.74print df2.groupby(df2.index.day).sum()      TranAmt19   68939.6320  262215.4221   15929.4522   36902.2323  242122.8424   27022.0425   30536.9426   28719.9327   24119.27