Pandas rolling apply using multiple columns Pandas rolling apply using multiple columns pandas pandas

Pandas rolling apply using multiple columns


You can use rolling_apply function from numpy_ext module:

import numpy as npimport pandas as pdfrom numpy_ext import rolling_applydef masscenter(price, nQty):    return np.sum(price * nQty) / np.sum(nQty)df = pd.DataFrame( [['02:59:47.000282', 87.60, 739],                    ['03:00:01.042391', 87.51, 10],                    ['03:00:01.630182', 87.51, 10],                    ['03:00:01.635150', 88.00, 792],                    ['03:00:01.914104', 88.00, 10]],                    columns=['stamp', 'price','nQty'])df['stamp'] = pd.to_datetime(df['stamp'], format='%H:%M:%S.%f')df.set_index('stamp', inplace=True, drop=True)window = 2df['y'] = rolling_apply(masscenter, window, df.price.values, df.nQty.values)print(df)                            price  nQty          ystamp                                             1900-01-01 02:59:47.000282  87.60   739        NaN1900-01-01 03:00:01.042391  87.51    10  87.5987981900-01-01 03:00:01.630182  87.51    10  87.5100001900-01-01 03:00:01.635150  88.00   792  87.9938901900-01-01 03:00:01.914104  88.00    10  88.000000


How about this:

def masscenter(ser):    print(df.loc[ser.index])    return 0rol = df.price.rolling(window=2)rol.apply(masscenter, raw=False)

It uses the rolling logic to get subsets from an arbitrary column. The raw=False option provides you with index values for those subsets (which are given to you as Series), then you use those index values to get multi-column slices from your original DataFrame.


So I found no way to roll over two columns, however without inbuilt pandas functions.The code is listed below.

# function to find an index corresponding# to current value minus offset valuedef prevInd(series, offset, date):    offset = to_offset(offset)    end_date = date - offset    end = series.index.searchsorted(end_date, side="left")    return end# function to find an index corresponding# to the first value greater than current# it is useful when one has timeseries with non-unique# but monotonically increasing valuesdef nextInd(series, date):    end = series.index.searchsorted(date, side="right")    return enddef twoColumnsRoll(dFrame, offset, usecols, fn, columnName = 'twoColRol'):    # find all unique indices    uniqueIndices = dFrame.index.unique()    numOfPoints = len(uniqueIndices)    # prepare an output array    moving = np.zeros(numOfPoints)    # nameholders    price = dFrame[usecols[0]]    qty   = dFrame[usecols[1]]    # iterate over unique indices    for ii in range(numOfPoints):        # nameholder        pp = uniqueIndices[ii]        # right index - value greater than current        rInd = afta.nextInd(dFrame,pp)        # left index - the least value that         # is bigger or equal than (pp - offset)        lInd = afta.prevInd(dFrame,offset,pp)        # call the actual calcuating function over two arrays        moving[ii] = fn(price[lInd:rInd], qty[lInd:rInd])    # construct and return DataFrame    return pd.DataFrame(data=moving,index=uniqueIndices,columns=[columnName])

This code works, but it is relatively slow and inefficient. I suppose one can use numpy.lib.stride_tricks from How to invoke pandas.rolling.apply with parameters from multiple column? to speedup things.However, go big or go home - I ended writing a function in C++ and a wrapper for it.
I'd like not to post it as answer, since it is a workaround and I have not answered neither part of my question, but it is too long for a commentary.