Pandas rolling apply using multiple columns
You can use rolling_apply function from numpy_ext module:
import numpy as npimport pandas as pdfrom numpy_ext import rolling_applydef masscenter(price, nQty): return np.sum(price * nQty) / np.sum(nQty)df = pd.DataFrame( [['02:59:47.000282', 87.60, 739], ['03:00:01.042391', 87.51, 10], ['03:00:01.630182', 87.51, 10], ['03:00:01.635150', 88.00, 792], ['03:00:01.914104', 88.00, 10]], columns=['stamp', 'price','nQty'])df['stamp'] = pd.to_datetime(df['stamp'], format='%H:%M:%S.%f')df.set_index('stamp', inplace=True, drop=True)window = 2df['y'] = rolling_apply(masscenter, window, df.price.values, df.nQty.values)print(df) price nQty ystamp 1900-01-01 02:59:47.000282 87.60 739 NaN1900-01-01 03:00:01.042391 87.51 10 87.5987981900-01-01 03:00:01.630182 87.51 10 87.5100001900-01-01 03:00:01.635150 88.00 792 87.9938901900-01-01 03:00:01.914104 88.00 10 88.000000
How about this:
def masscenter(ser): print(df.loc[ser.index]) return 0rol = df.price.rolling(window=2)rol.apply(masscenter, raw=False)
It uses the rolling logic to get subsets from an arbitrary column. The raw=False option provides you with index values for those subsets (which are given to you as Series), then you use those index values to get multi-column slices from your original DataFrame.
So I found no way to roll over two columns, however without inbuilt pandas functions.The code is listed below.
# function to find an index corresponding# to current value minus offset valuedef prevInd(series, offset, date): offset = to_offset(offset) end_date = date - offset end = series.index.searchsorted(end_date, side="left") return end# function to find an index corresponding# to the first value greater than current# it is useful when one has timeseries with non-unique# but monotonically increasing valuesdef nextInd(series, date): end = series.index.searchsorted(date, side="right") return enddef twoColumnsRoll(dFrame, offset, usecols, fn, columnName = 'twoColRol'): # find all unique indices uniqueIndices = dFrame.index.unique() numOfPoints = len(uniqueIndices) # prepare an output array moving = np.zeros(numOfPoints) # nameholders price = dFrame[usecols[0]] qty = dFrame[usecols[1]] # iterate over unique indices for ii in range(numOfPoints): # nameholder pp = uniqueIndices[ii] # right index - value greater than current rInd = afta.nextInd(dFrame,pp) # left index - the least value that # is bigger or equal than (pp - offset) lInd = afta.prevInd(dFrame,offset,pp) # call the actual calcuating function over two arrays moving[ii] = fn(price[lInd:rInd], qty[lInd:rInd]) # construct and return DataFrame return pd.DataFrame(data=moving,index=uniqueIndices,columns=[columnName])
This code works, but it is relatively slow and inefficient. I suppose one can use numpy.lib.stride_tricks from How to invoke pandas.rolling.apply with parameters from multiple column? to speedup things.However, go big or go home - I ended writing a function in C++ and a wrapper for it.
I'd like not to post it as answer, since it is a workaround and I have not answered neither part of my question, but it is too long for a commentary.