pandas iteratively update column values pandas iteratively update column values numpy numpy

pandas iteratively update column values


Rather than fight against the fundamentally iterative nature of your problem, you could use numba and try to do the easiest performant iterative version you can:

@numba.jit(nopython=True)def epow(vec, p):    out = np.zeros(len(vec))    out[0] = vec[0]    for i in range(1, len(vec)):        out[i] = vec[i] + (out[i-1])**0.8    return out

which gives me

In [148]: a1, a2, a3, a4 = range(1, 5)In [149]: a1, a2+a1**0.8, a3 + (a2 + a1**0.8)**0.8, a4 + (a3 + (a2 + a1**0.8)**0.8)**0.8Out[149]: (1, 3.0, 5.408224685280692, 7.858724574530816)In [150]: epow(pd.Series([a1, a2, a3, a4]).values, 0.8)Out[150]: array([1.        , 3.        , 5.40822469, 7.85872457])

and for longer Series:

In [151]: s = pd.Series(np.arange(2*10**6))In [152]: %time epow(s.values, 0.8)CPU times: user 512 ms, sys: 20 ms, total: 532 msWall time: 531 msOut[152]: array([0.00000000e+00, 1.00000000e+00, 3.00000000e+00, ...,       2.11487244e+06, 2.11487348e+06, 2.11487453e+06])


The important point about these kinds of problems that you need to know is that you're on a paradox spot right now. This means that you're on a point that you want to take advantage of both vectorization and non-vectorization like threading or parallelization.

In such situation you can try one/some of the following options:

  1. Change the type of your data structure.

  2. Rethink your problem and see if it's possible to solve this entirely in a Vectorized way (preferably)

  3. Simply use a non-vectorized-based approach but sacrifice something else like memory.


You can use a recursive function, it would run in O(log(n)) time

def my_func(x):    if len(x) == 1:        b.append(x[0])        return x[0]    element = x[-1]+0.8*my_func(x[:-1])    b.append(element)    return elementb= []my_func(list(a))