Pandas: resample timeseries with groupby

python pandas group-by time-series

In my original post, I suggested using pd.TimeGrouper. Nowadays, use pd.Grouper instead of pd.TimeGrouper. The syntax is largely the same, but TimeGrouper is now deprecated in favor of pd.Grouper.

Moreover, while pd.TimeGrouper could only group by DatetimeIndex, pd.Grouper can group by datetime columns which you can specify through the key parameter.

You could use a pd.Grouper to group the DatetimeIndex'ed DataFrame by hour:

grouper = df.groupby([pd.Grouper(freq='1H'), 'Location'])

use count to count the number of events in each group:

grouper['Event'].count()#                      Location# 2014-08-25 21:00:00  HK          1#                      LDN         1# 2014-08-25 22:00:00  LDN         2# Name: Event, dtype: int64

use unstack to move the Location index level to a column level:

grouper['Event'].count().unstack()# Out[49]: # Location             HK  LDN# 2014-08-25 21:00:00   1    1# 2014-08-25 22:00:00 NaN    2

and then use fillna to change the NaNs into zeros.

Putting it all together,

grouper = df.groupby([pd.Grouper(freq='1H'), 'Location'])result = grouper['Event'].count().unstack('Location').fillna(0)

yields

Location             HK  LDN2014-08-25 21:00:00   1    12014-08-25 22:00:00   0    2

python pandas group-by time-series

Pandas 0.21 answer: TimeGrouper is getting deprecated

There are two options for doing this. They actually can give different results based on your data. The first option groups by Location and within Location groups by hour. The second option groups by Location and hour at the same time.

Option 1: Use groupby + resample

grouped = df.groupby('Location').resample('H')['Event'].count()

Option 2: Group both the location and DatetimeIndex together with groupby(pd.Grouper)

grouped = df.groupby(['Location', pd.Grouper(freq='H')])['Event'].count()

They both will result in the following:

Location                     HK        2014-08-25 21:00:00    1LDN       2014-08-25 21:00:00    1          2014-08-25 22:00:00    2Name: Event, dtype: int64

And then reshape:

grouped.unstack('Location', fill_value=0)

Will output

Location             HK  LDN2014-08-25 21:00:00   1    12014-08-25 22:00:00   0    2

python pandas group-by time-series

Multiple Column Group By

untubu is spot on with his answer but I wanted to add in what you could do if you had a third column, say Cost and wanted to aggregate it like above. It was through combining unutbu's answer and this one that I found out how to do this and thought I would share for future users.

Create a DataFrame with Cost column:

In[1]:import pandas as pdimport numpy as nptimes = pd.to_datetime([    "2014-08-25 21:00:00", "2014-08-25 21:04:00",    "2014-08-25 22:07:00", "2014-08-25 22:09:00"])df = pd.DataFrame({    "Location": ["HK", "LDN", "LDN", "LDN"],    "Event":    ["foo", "bar", "baz", "qux"],    "Cost":     [20, 24, 34, 52]}, index = times)dfOut[1]:                     Location  Event  Cost2014-08-25 21:00:00        HK    foo    202014-08-25 21:04:00       LDN    bar    242014-08-25 22:07:00       LDN    baz    342014-08-25 22:09:00       LDN    qux    52

Now we group by using the agg function to specify each column's aggregation method, e.g. count, mean, sum, etc.

In[2]:grp = df.groupby([pd.Grouper(freq = "1H"), "Location"]) \      .agg({"Event": np.size, "Cost": np.mean})grpOut[2]:                               Event  Cost                     Location2014-08-25 21:00:00  HK            1    20                     LDN           1    242014-08-25 22:00:00  LDN           2    43

Then the final unstack with fill NaN with zeros and display as int because it's nice.

In[3]: grp.unstack().fillna(0).astype(int)Out[3]:                    Event     CostLocation               HK LDN   HK LDN2014-08-25 21:00:00     1   1   20  242014-08-25 22:00:00     0   2    0  43

CodeHunter

Pandas: resample timeseries with groupby

Pandas 0.21 answer: TimeGrouper is getting deprecated

Multiple Column Group By

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last