Replace values in dataframe column depending on another column with condition
Multiple steps but works. Find index of rows where y is 255 till you find the next 1. Save the values in idx. Now create new_x using the idx and the other two condition (y == 1 or y == 255). Ffill the rest.
# Index of rows between 255 and 1 in column yidx = df.loc[df['y'].replace(0, np.nan).ffill() == 255, 'y'].index# Create x_new1 and assign value of x where index is idx or y == 1 or y ==255df.loc[idx, 'x_new1'] = df['x']df.loc[(df['y'] == 1) | (df['y'] == 255) , 'x_new1'] = df['x']# ffill rest of the values in x_new1df['x_new1'] = df['x_new1'].ffill() x y z x_new x_new10 12.28 1 1 12.28 12.281 11.99 0 1 12.28 12.282 11.50 0 1 12.28 12.283 11.20 0 1 12.28 12.284 11.01 0 1 12.28 12.285 9.74 255 0 9.74 9.746 13.80 0 0 13.80 13.807 15.20 0 0 15.20 15.208 17.80 0 0 17.80 17.809 12.10 1 1 12.10 12.1010 11.90 0 1 12.10 12.1011 11.70 0 1 12.10 12.1012 11.20 0 1 12.10 12.1013 10.30 255 0 10.30 10.30
Try:
# mark the occurrences of 1 and 255df['is_1_255'] = df.y[(df.y==1)|(df.y==255)]df['x_n'] = None# copy the 1's df.loc[df.is_1_255==1,'x_n'] = df.loc[df.is_1_255==1,'x']# fill is_1_255 with markers, #255 means between 255 and 1, 1 means between 1 and 255df['is_1_255'] = df['is_1_255'].ffill()# update the 255 valuesdf.loc[df.is_1_255==255, 'x_n'] = df.loc[df.is_1_255==255,'x']# update the 1 valuesdf['x_n'].ffill(inplace=True)
Output:
+-----+-------+-----+---+-------+----------+-------+| idx | x | y | z | x_new | is_1_255 | x_n |+-----+-------+-----+---+-------+----------+-------+| 0 | 12.28 | 1 | 1 | 12.28 | 1.0 | 12.28 || 1 | 11.99 | 0 | 1 | 12.28 | 1.0 | 12.28 || 2 | 11.50 | 0 | 1 | 12.28 | 1.0 | 12.28 || 3 | 11.20 | 0 | 1 | 12.28 | 1.0 | 12.28 || 4 | 11.01 | 0 | 1 | 12.28 | 1.0 | 12.28 || 5 | 9.74 | 255 | 0 | 9.74 | 255.0 | 9.74 || 6 | 13.80 | 0 | 0 | 13.80 | 255.0 | 13.80 || 7 | 15.20 | 0 | 0 | 15.20 | 255.0 | 15.20 || 8 | 17.80 | 0 | 0 | 17.80 | 255.0 | 17.80 || 9 | 12.10 | 1 | 1 | 12.10 | 1.0 | 12.10 || 10 | 11.90 | 0 | 1 | 12.10 | 1.0 | 12.10 || 11 | 11.70 | 0 | 1 | 12.10 | 1.0 | 12.10 || 12 | 11.20 | 0 | 1 | 12.10 | 1.0 | 12.10 || 13 | 10.30 | 255 | 0 | 10.30 | 255.0 | 10.30 |+-----+-------+-----+---+-------+----------+-------+
Assuming clean data where 1 and 255 always occur in pairs, we can form groups of 1-255 and groupby to fill in the data.
s = (df.y.eq(1).cumsum() == df.y.eq(255).cumsum()+1)df['xnew'] = df.groupby(s.ne(s.shift()).cumsum().where(s)).x.transform('first').fillna(df.x) x y z xnew0 12.28 1 1 12.281 11.99 0 1 12.282 11.50 0 1 12.283 11.20 0 1 12.284 11.01 0 1 12.285 9.74 255 0 9.746 13.80 0 0 13.807 15.20 0 0 15.208 17.80 0 0 17.809 12.10 1 1 12.1010 11.90 0 1 12.1011 11.70 0 1 12.1012 11.20 0 1 12.1013 10.30 255 0 10.30
Though for something like this, you should really form a thorough unit test, because this logic can get quite tricky and problematic for incorrect inputs.