Replace values in dataframe column depending on another column with condition Replace values in dataframe column depending on another column with condition pandas pandas

Replace values in dataframe column depending on another column with condition


Multiple steps but works. Find index of rows where y is 255 till you find the next 1. Save the values in idx. Now create new_x using the idx and the other two condition (y == 1 or y == 255). Ffill the rest.

# Index of rows between 255 and 1 in column yidx = df.loc[df['y'].replace(0, np.nan).ffill() == 255, 'y'].index# Create x_new1 and assign value of x where index is idx or y == 1 or y ==255df.loc[idx, 'x_new1'] = df['x']df.loc[(df['y'] == 1) | (df['y'] == 255) , 'x_new1'] = df['x']# ffill rest of the values in x_new1df['x_new1'] = df['x_new1'].ffill()    x       y   z   x_new   x_new10   12.28   1   1   12.28   12.281   11.99   0   1   12.28   12.282   11.50   0   1   12.28   12.283   11.20   0   1   12.28   12.284   11.01   0   1   12.28   12.285   9.74    255 0   9.74    9.746   13.80   0   0   13.80   13.807   15.20   0   0   15.20   15.208   17.80   0   0   17.80   17.809   12.10   1   1   12.10   12.1010  11.90   0   1   12.10   12.1011  11.70   0   1   12.10   12.1012  11.20   0   1   12.10   12.1013  10.30   255 0   10.30   10.30


Try:

# mark the occurrences of 1 and 255df['is_1_255'] = df.y[(df.y==1)|(df.y==255)]df['x_n'] = None# copy the 1's df.loc[df.is_1_255==1,'x_n'] = df.loc[df.is_1_255==1,'x']# fill is_1_255 with markers, #255 means between 255 and 1, 1 means between 1 and 255df['is_1_255'] = df['is_1_255'].ffill()# update the 255 valuesdf.loc[df.is_1_255==255, 'x_n'] = df.loc[df.is_1_255==255,'x']# update the 1 valuesdf['x_n'].ffill(inplace=True)

Output:

+-----+-------+-----+---+-------+----------+-------+| idx |   x   |  y  | z | x_new | is_1_255 |  x_n  |+-----+-------+-----+---+-------+----------+-------+|   0 | 12.28 |   1 | 1 | 12.28 | 1.0      | 12.28 ||   1 | 11.99 |   0 | 1 | 12.28 | 1.0      | 12.28 ||   2 | 11.50 |   0 | 1 | 12.28 | 1.0      | 12.28 ||   3 | 11.20 |   0 | 1 | 12.28 | 1.0      | 12.28 ||   4 | 11.01 |   0 | 1 | 12.28 | 1.0      | 12.28 ||   5 | 9.74  | 255 | 0 | 9.74  | 255.0    | 9.74  ||   6 | 13.80 |   0 | 0 | 13.80 | 255.0    | 13.80 ||   7 | 15.20 |   0 | 0 | 15.20 | 255.0    | 15.20 ||   8 | 17.80 |   0 | 0 | 17.80 | 255.0    | 17.80 ||   9 | 12.10 |   1 | 1 | 12.10 | 1.0      | 12.10 ||  10 | 11.90 |   0 | 1 | 12.10 | 1.0      | 12.10 ||  11 | 11.70 |   0 | 1 | 12.10 | 1.0      | 12.10 ||  12 | 11.20 |   0 | 1 | 12.10 | 1.0      | 12.10 ||  13 | 10.30 | 255 | 0 | 10.30 | 255.0    | 10.30 |+-----+-------+-----+---+-------+----------+-------+


Assuming clean data where 1 and 255 always occur in pairs, we can form groups of 1-255 and groupby to fill in the data.

s = (df.y.eq(1).cumsum() == df.y.eq(255).cumsum()+1)df['xnew'] = df.groupby(s.ne(s.shift()).cumsum().where(s)).x.transform('first').fillna(df.x)        x    y  z   xnew0   12.28    1  1  12.281   11.99    0  1  12.282   11.50    0  1  12.283   11.20    0  1  12.284   11.01    0  1  12.285    9.74  255  0   9.746   13.80    0  0  13.807   15.20    0  0  15.208   17.80    0  0  17.809   12.10    1  1  12.1010  11.90    0  1  12.1011  11.70    0  1  12.1012  11.20    0  1  12.1013  10.30  255  0  10.30

Though for something like this, you should really form a thorough unit test, because this logic can get quite tricky and problematic for incorrect inputs.