pandas dataframe groupby datetime month pandas dataframe groupby datetime month python python

pandas dataframe groupby datetime month


Managed to do it:

b = pd.read_csv('b.dat')b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')b.groupby(by=[b.index.month, b.index.year])

Or

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+


(update: 2018)

Note that pd.Timegrouper is depreciated and will be removed. Use instead:

 df.groupby(pd.Grouper(freq='M'))


One solution which avoids MultiIndex is to create a new datetime column setting day = 1. Then group by this column.

Normalise day of month

df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20', '2017-10-01', '2017-09-01']),                   'Values': [5, 10, 15, 20]})# normalize day to beginning of month, 4 alternative methods belowdf['YearMonth'] = df['Date'] + pd.offsets.MonthEnd(-1) + pd.offsets.Day(1)df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))df['YearMonth'] = df['Date'].dt.normalize().map(pd.tseries.offsets.MonthBegin().rollback)

Then use groupby as normal:

g = df.groupby('YearMonth')res = g['Values'].sum()# YearMonth# 2017-09-01    20# 2017-10-01    30# Name: Values, dtype: int64

Comparison with pd.Grouper

The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginning of each month rather than the end, and therefore you can easily extract groups via get_group:

some_group = g.get_group('2017-10-01')

Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a convention parameter, but this is only applicable for a PeriodIndex grouper.

Comparison with string conversion

An alternative to the above idea is to convert to a string, e.g. convert datetime 2017-10-XX to string '2017-10'. However, this is not recommended since you lose all the efficiency benefits of a datetime series (stored internally as numerical data in a contiguous memory block) versus an object series of strings (stored as an array of pointers).