Pandas pd.cut() - binning datetime column / series Pandas pd.cut() - binning datetime column / series python-3.x python-3.x

Pandas pd.cut() - binning datetime column / series


UPDATE: starting from Pandas v0.20.1 (May 5, 2017) pd.cut and pd.qcut support datetime64 and timedelta64 dtypes (GH14714, GH14798).

Thanks @lighthouse65 for checking this!


Old answer:

Consider this approach:

df = pd.DataFrame(pd.date_range('2000-01-02', freq='1D', periods=15), columns=['Date'])bins_dt = pd.date_range('2000-01-01', freq='3D', periods=6)bins_str = bins_dt.astype(str).valueslabels = ['({}, {}]'.format(bins_str[i-1], bins_str[i]) for i in range(1, len(bins_str))]df['cat'] = pd.cut(df.Date.astype(np.int64)//10**9,                   bins=bins_dt.astype(np.int64)//10**9,                   labels=labels)

Result:

In [59]: dfOut[59]:         Date                       cat0  2000-01-02  (2000-01-01, 2000-01-04]1  2000-01-03  (2000-01-01, 2000-01-04]2  2000-01-04  (2000-01-01, 2000-01-04]3  2000-01-05  (2000-01-04, 2000-01-07]4  2000-01-06  (2000-01-04, 2000-01-07]5  2000-01-07  (2000-01-04, 2000-01-07]6  2000-01-08  (2000-01-07, 2000-01-10]7  2000-01-09  (2000-01-07, 2000-01-10]8  2000-01-10  (2000-01-07, 2000-01-10]9  2000-01-11  (2000-01-10, 2000-01-13]10 2000-01-12  (2000-01-10, 2000-01-13]11 2000-01-13  (2000-01-10, 2000-01-13]12 2000-01-14  (2000-01-13, 2000-01-16]13 2000-01-15  (2000-01-13, 2000-01-16]14 2000-01-16  (2000-01-13, 2000-01-16]In [60]: df.dtypesOut[60]:Date    datetime64[ns]cat           categorydtype: object

Explanation:

df.Date.astype(np.int64)//10**9 - converts datetime values into UNIX epoch (timestamp - # of seconds since 1970-01-01 00:00:00):

In [65]: df.Date.astype(np.int64)//10**9Out[65]:0     9467712001     9468576002     9469440003     9470304004     9471168005     9472032006     9472896007     9473760008     9474624009     94754880010    94763520011    94772160012    94780800013    94789440014    947980800Name: Date, dtype: int64

the same will applyied to bins:

In [66]: bins_dt.astype(np.int64)//10**9Out[66]: Int64Index([946684800, 946944000, 947203200, 947462400, 947721600, 947980800], dtype='int64')

labels:

In [67]: labelsOut[67]:['(2000-01-01, 2000-01-04]', '(2000-01-04, 2000-01-07]', '(2000-01-07, 2000-01-10]', '(2000-01-10, 2000-01-13]', '(2000-01-13, 2000-01-16]']