Apply number ceiling/threshold to pandas dataframe
Use:
df = df.mask(df.gt(2.5).cumsum(1).gt(0), 3)#same as#df = df.mask((df > 2.5).cumsum(axis=1) > 0, 3)print (df) 1 2 3 4 5 6 7John 1.3 3.0 3.0 3.0 3.0 3.0 3.0Terry 1.1 2.3 3.0 3.0 3.0 3.0 3.0Henry 0.3 1.0 2.0 3.0 3.0 3.0 3.0
Detail:
First compare by 2.5
all values by gt
:
print (df.gt(2.5)) 1 2 3 4 5 6 7John False True True True True True TrueTerry False False True True True False TrueHenry False False False True True False True
Then get cumsum
by columns by axis=1
:
print (df.gt(2.5).cumsum(axis=1)) 1 2 3 4 5 6 7John 0 1 2 3 4 5 6Terry 0 0 1 2 3 3 4Henry 0 0 0 1 2 2 3
And compare with 0
by eq
:
print (df.gt(2.5).cumsum(axis=1).gt(0)) 1 2 3 4 5 6 7John False True True True True True TrueTerry False False True True True True TrueHenry False False False True True True True
Last replace True
s by 3
by mask
:
print (df.mask(df.gt(2.5).cumsum(1).gt(0), 3)) 1 2 3 4 5 6 7John 1.3 3.0 3.0 3.0 3.0 3.0 3.0Terry 1.1 2.3 3.0 3.0 3.0 3.0 3.0Henry 0.3 1.0 2.0 3.0 3.0 3.0 3.0
For improve performance is possible use numpy
:
a = df.valuesdf1 = pd.DataFrame(np.where(np.cumsum(a > 2.5, axis=1) > 0, 3, a), index=df.index, columns=df.columns)print (df1) 1 2 3 4 5 6 7John 1.3 3.0 3.0 3.0 3.0 3.0 3.0Terry 1.1 2.3 3.0 3.0 3.0 3.0 3.0Henry 0.3 1.0 2.0 3.0 3.0 3.0 3.0