conditional fill in pandas dataframe
This can be done fairly efficiently with Numba. If you are not able to use Numba, just omit @njit
and your logic will run as a Python-level loop.
import numpy as npimport pandas as pdfrom numba import njitnp.random.seed(0)df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=['A'])df.loc[1, 'A'] = np.nandf.loc[15, 'A'] = np.nandf.loc[240, 'A'] = np.nan@njitdef recurse_nb(x): out = x.copy() for i in range(1, x.shape[0]): if not np.isnan(x[i]) and (abs(1 - x[i] / out[i-1]) < 0.3): out[i] = out[i-1] return outdf['B'] = recurse_nb(df['A'].values)print(df.head(10)) A B0 3764.052346 3764.0523461 NaN NaN2 2978.737984 2978.7379843 4240.893199 4240.8931994 3867.557990 4240.8931995 1022.722120 1022.7221206 2950.088418 2950.0884187 1848.642792 1848.6427928 1896.781148 1848.6427929 2410.598502 2410.598502
Not sure what you want to do with the first B-1
and the dividing by NaN
situation:
df = pd.DataFrame([1,2,3,4,5,None,6,7,8,9,10], columns=['A'])b1 = df.A.shift(1)b1[0] = 1b = list(map(lambda a,b1: a if np.isnan(a) else (b1 if abs(b1-a)/b1 < 0.3 else a), df.A, b1 ))df['B'] = bdf A B0 1.0 1.01 2.0 2.02 3.0 3.03 4.0 4.04 5.0 4.05 NaN NaN6 6.0 6.07 7.0 6.08 8.0 7.09 9.0 8.010 10.0 9.0
as per @jpp, you could also do a list comprehension version for list b
:
b = [a if np.isnan(a) or abs(b-a)/b >= 0.3 else b for a,b in zip(df.A,b1)]
A simple solution that I could come up with is following. I was wondering if there is more pythonic way of doing things:
a = df['A'].values b = [] b.append(t[0]) for i in range(1, len(a)): if np.isnan(a[i]): b.append(a[i]) else: b.append(b[i-1] if abs(1 - a[i]/b[i-1]) < 0.3 else a[i]) df['B'] = b