Pandas pd.cut() - binning datetime column / series
UPDATE: starting from Pandas v0.20.1 (May 5, 2017) pd.cut
and pd.qcut
support datetime64 and timedelta64 dtypes (GH14714, GH14798).
Thanks @lighthouse65 for checking this!
Old answer:
Consider this approach:
df = pd.DataFrame(pd.date_range('2000-01-02', freq='1D', periods=15), columns=['Date'])bins_dt = pd.date_range('2000-01-01', freq='3D', periods=6)bins_str = bins_dt.astype(str).valueslabels = ['({}, {}]'.format(bins_str[i-1], bins_str[i]) for i in range(1, len(bins_str))]df['cat'] = pd.cut(df.Date.astype(np.int64)//10**9, bins=bins_dt.astype(np.int64)//10**9, labels=labels)
Result:
In [59]: dfOut[59]: Date cat0 2000-01-02 (2000-01-01, 2000-01-04]1 2000-01-03 (2000-01-01, 2000-01-04]2 2000-01-04 (2000-01-01, 2000-01-04]3 2000-01-05 (2000-01-04, 2000-01-07]4 2000-01-06 (2000-01-04, 2000-01-07]5 2000-01-07 (2000-01-04, 2000-01-07]6 2000-01-08 (2000-01-07, 2000-01-10]7 2000-01-09 (2000-01-07, 2000-01-10]8 2000-01-10 (2000-01-07, 2000-01-10]9 2000-01-11 (2000-01-10, 2000-01-13]10 2000-01-12 (2000-01-10, 2000-01-13]11 2000-01-13 (2000-01-10, 2000-01-13]12 2000-01-14 (2000-01-13, 2000-01-16]13 2000-01-15 (2000-01-13, 2000-01-16]14 2000-01-16 (2000-01-13, 2000-01-16]In [60]: df.dtypesOut[60]:Date datetime64[ns]cat categorydtype: object
Explanation:
df.Date.astype(np.int64)//10**9
- converts datetime
values into UNIX epoch (timestamp - # of seconds since 1970-01-01 00:00:00
):
In [65]: df.Date.astype(np.int64)//10**9Out[65]:0 9467712001 9468576002 9469440003 9470304004 9471168005 9472032006 9472896007 9473760008 9474624009 94754880010 94763520011 94772160012 94780800013 94789440014 947980800Name: Date, dtype: int64
the same will applyied to bins
:
In [66]: bins_dt.astype(np.int64)//10**9Out[66]: Int64Index([946684800, 946944000, 947203200, 947462400, 947721600, 947980800], dtype='int64')
labels:
In [67]: labelsOut[67]:['(2000-01-01, 2000-01-04]', '(2000-01-04, 2000-01-07]', '(2000-01-07, 2000-01-10]', '(2000-01-10, 2000-01-13]', '(2000-01-13, 2000-01-16]']