Pandas Efficient VWAP Calculation

python numpy pandas apply cumulative-sum

Getting into one pass vs one line starts to get a little semantical. How about this for a distinction: you can do it with 1 line of pandas, 1 line of numpy, or several lines of numba.

from numba import jitdf=pd.DataFrame( np.random.randn(10000,3), columns=['v','h','l'] )df['vwap_pandas'] = (df.v*(df.h+df.l)/2).cumsum() / df.v.cumsum()@jitdef vwap():    tmp1 = np.zeros_like(v)    tmp2 = np.zeros_like(v)    for i in range(0,len(v)):        tmp1[i] = tmp1[i-1] + v[i] * ( h[i] + l[i] ) / 2.        tmp2[i] = tmp2[i-1] + v[i]    return tmp1 / tmp2v = df.v.valuesh = df.h.valuesl = df.l.valuesdf['vwap_numpy'] = np.cumsum(v*(h+l)/2) / np.cumsum(v)df['vwap_numba'] = vwap()

Timings:

%timeit (df.v*(df.h+df.l)/2).cumsum() / df.v.cumsum()  # pandas1000 loops, best of 3: 829 µs per loop%timeit np.cumsum(v*(h+l)/2) / np.cumsum(v)            # numpy10000 loops, best of 3: 165 µs per loop%timeit vwap()                                         # numba10000 loops, best of 3: 87.4 µs per loop

python numpy pandas apply cumulative-sum

Quick Edit: Just wanted to thank John for the original post :)

You can get even faster results by @jit-ing numpy's version:

@jitdef np_vwap():    return np.cumsum(v*(h+l)/2) / np.cumsum(v)

This got me 50.9 µs per loop as opposed to 74.5 µs per loop using the vwap version above.

CodeHunter

Pandas Efficient VWAP Calculation

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last