How to resample a dataframe with different functions applied to each column?
With pandas 0.18 the resample API changed (see the docs). So for pandas >= 0.18 the answer is:
In [31]: frame.resample('1H').agg({'radiation': np.sum, 'tamb': np.mean})Out[31]: tamb radiation2012-04-05 08:00:00 5.161235 279.5071822012-04-05 09:00:00 4.968145 290.9410732012-04-05 10:00:00 4.478531 317.6782852012-04-05 11:00:00 4.706206 335.2586332012-04-05 12:00:00 2.457873 8.655838
Old Answer:
I am answering my question to reflect the time series related changes in pandas >= 0.8
(all other answers are outdated).
Using pandas >= 0.8 the answer is:
In [30]: frame.resample('1H', how={'radiation': np.sum, 'tamb': np.mean})Out[30]: tamb radiation2012-04-05 08:00:00 5.161235 279.5071822012-04-05 09:00:00 4.968145 290.9410732012-04-05 10:00:00 4.478531 317.6782852012-04-05 11:00:00 4.706206 335.2586332012-04-05 12:00:00 2.457873 8.655838
You can also downsample using the asof
method of pandas.DateRange
objects.
In [21]: hourly = pd.DateRange(datetime.datetime(2012, 4, 5, 8, 0),... datetime.datetime(2012, 4, 5, 12, 0),... offset=pd.datetools.Hour())In [22]: frame.groupby(hourly.asof).size()Out[22]: key_02012-04-05 08:00:00 602012-04-05 09:00:00 602012-04-05 10:00:00 602012-04-05 11:00:00 602012-04-05 12:00:00 1In [23]: frame.groupby(hourly.asof).agg({'radiation': np.sum, 'tamb': np.mean})Out[23]: radiation tamb key_0 2012-04-05 08:00:00 271.54 4.4912012-04-05 09:00:00 266.18 5.2532012-04-05 10:00:00 292.35 4.9592012-04-05 11:00:00 283.00 5.4892012-04-05 12:00:00 0.5414 9.532
To tantalize you, in pandas 0.8.0 (under heavy development in the timeseries
branch on GitHub), you'll be able to do:
In [5]: frame.convert('1h', how='mean')Out[5]: radiation tamb2012-04-05 08:00:00 7.840989 8.4461092012-04-05 09:00:00 4.898935 5.4592212012-04-05 10:00:00 5.227741 4.6608492012-04-05 11:00:00 4.689270 5.3213982012-04-05 12:00:00 4.956994 5.093980
The above mentioned methods are the right strategy with the current production version of pandas.