Which is the fastest way to extract day, month and year from a given date?
In 0.15.0 you will be able to use the new .dt accessor to do this nice syntactically.
In [36]: df = DataFrame(date_range('20000101',periods=150000,freq='H'),columns=['Date'])In [37]: df.head(5)Out[37]: Date0 2000-01-01 00:00:001 2000-01-01 01:00:002 2000-01-01 02:00:003 2000-01-01 03:00:004 2000-01-01 04:00:00[5 rows x 1 columns]In [38]: %timeit f(df)10 loops, best of 3: 22 ms per loopIn [39]: def f(df): df = df.copy() df['Year'] = DatetimeIndex(df['Date']).year df['Month'] = DatetimeIndex(df['Date']).month df['Day'] = DatetimeIndex(df['Date']).day return df ....: In [40]: f(df).head()Out[40]: Date Year Month Day0 2000-01-01 00:00:00 2000 1 11 2000-01-01 01:00:00 2000 1 12 2000-01-01 02:00:00 2000 1 13 2000-01-01 03:00:00 2000 1 14 2000-01-01 04:00:00 2000 1 1[5 rows x 4 columns]
From 0.15.0 on (release in end of Sept 2014), the following is now possible with the new .dt accessor:
df['Year'] = df['Date'].dt.yeardf['Month'] = df['Date'].dt.monthdf['Day'] = df['Date'].dt.day
This is the cleanest answer I've found.
df = df.assign(**{t:getattr(df.data.dt,t) for t in nomtimes})
In [30]: df = pd.DataFrame({'data':pd.date_range(start, end)})In [31]: df.head()Out[31]: data0 2011-01-011 2011-01-022 2011-01-033 2011-01-044 2011-01-05nomtimes = ["year", "hour", "month", "dayofweek"] df = df.assign(**{t:getattr(df.data.dt,t) for t in nomtimes})In [33]: df.head()Out[33]: data dayofweek hour month year0 2011-01-01 5 0 1 20111 2011-01-02 6 0 1 20112 2011-01-03 0 0 1 20113 2011-01-04 1 0 1 20114 2011-01-05 2 0 1 2011