Pandas - grouping intra day timeseries by date Pandas - grouping intra day timeseries by date numpy numpy

Pandas - grouping intra day timeseries by date


df.groupby([df.index.year, df.index.month, df.index.day]).transform(np.cumsum).resample('B', how='ohlc')

I think this might be what I want but I have to test.

EDIT:After zelazny7's repsonse:

df.groupby(pd.TimeGrouper('D')).transform(np.cumsum).resample('D', how='ohlc')

works and is also more efficient than my previous solution.

UPDATE:

pd.TimeGrouper('D') is deprecated since pandas v0.21.0.

Use pd.Grouper() instead:

df.groupby(pd.Grouper(freq='D')).transform(np.cumsum).resample('D', how='ohlc')


I wasn't able to get your resample suggestion to work. Did you have any luck? Here's a way to aggregate the data at the business day level and compute the OHLC stats in one pass:

from io import BytesIOfrom pandas import *text = """1999-08-09 12:30:00-04:00   -0.0004861999-08-09 12:31:00-04:00   -0.0006061999-08-09 12:32:00-04:00   -0.0001201999-08-09 12:33:00-04:00   -0.0000371999-08-09 12:34:00-04:00   -0.0003371999-08-09 12:35:00-04:00    0.0001001999-08-09 12:36:00-04:00    0.0002191999-08-09 12:37:00-04:00    0.0002851999-08-09 12:38:00-04:00   -0.0009811999-08-09 12:39:00-04:00   -0.0004871999-08-09 12:40:00-04:00    0.0004761999-08-09 12:41:00-04:00    0.0003621999-08-09 12:42:00-04:00   -0.0000381999-08-09 12:43:00-04:00   -0.0003101999-08-09 12:44:00-04:00   -0.000337"""df = read_csv(BytesIO(text), sep='\s+', parse_dates=[[0,1]], index_col=[0], header=None)

Here I create a dictionary of dictionaries. The outer key references the columns you want to apply the functions to. The inner key contains the names of your aggregation functions and the inner values are the functions you want to apply:

f = {2: {'O':'first',         'H':'max',         'L':'min',         'C':'last'}}df.groupby(TimeGrouper(freq='B')).agg(f)Out:                   2                   H         C         L         O1999-08-09  0.000476 -0.000337 -0.000981 -0.000486