Efficiently replace values from a column to another column Pandas DataFrame

Using np.where is faster. Using a similar pattern as you used with replace:

df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])

However, using a nested np.where is slightly faster:

df['col1'] = np.where(df['col1'] == 0,                       np.where(df['col2'] == 0, df['col3'], df['col2']),                      df['col1'])

Timings

Using the following setup to produce a larger sample DataFrame and timing functions:

df = pd.concat([df]*10**4, ignore_index=True)def root_nested(df):    df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1'])    return dfdef root_split(df):    df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])    df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])    return dfdef pir2(df):    df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)    return dfdef pir2_2(df):    slc = (df.values != 0).argmax(axis=1)    return df.values[np.arange(slc.shape[0]), slc]def andrew(df):    df.col1[df.col1 == 0] = df.col2    df.col1[df.col1 == 0] = df.col3    return dfdef pablo(df):    df['col1'] = df['col1'].replace(0,df['col2'])    df['col1'] = df['col1'].replace(0,df['col3'])    return df

I get the following timings:

%timeit root_nested(df.copy())100 loops, best of 3: 2.25 ms per loop%timeit root_split(df.copy())100 loops, best of 3: 2.62 ms per loop%timeit pir2(df.copy())100 loops, best of 3: 6.25 ms per loop%timeit pir2_2(df.copy())1 loop, best of 3: 2.4 ms per loop%timeit andrew(df.copy())100 loops, best of 3: 8.55 ms per loop

I tried timing your method, but it's been running for multiple minutes without completing. As a comparison, timing your method on just the 6 row example DataFrame (not the much larger one tested above) took 12.8 ms.

python pandas replace dataframe

I'm not sure if it's faster, but you're right that you can slice the dataframe to get your desired result.

df.col1[df.col1 == 0] = df.col2df.col1[df.col1 == 0] = df.col3print(df)

Output:

   col1  col2  col30   0.2   0.3   0.31   0.2   0.3   0.32   0.4   0.4   0.43   0.3   0.0   0.34   0.0   0.0   0.05   0.1   0.4   0.4

Alternatively if you want it to be more terse (though I don't know if it's faster) you can combine what you did with what I did.

df.col1[df.col1 == 0] = df.col2.replace(0, df.col3)print(df)

Output:

   col1  col2  col30   0.2   0.3   0.31   0.2   0.3   0.32   0.4   0.4   0.43   0.3   0.0   0.34   0.0   0.0   0.05   0.1   0.4   0.4

python pandas replace dataframe

approach using pd.DataFrame.where and pd.DataFrame.bfill

df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0)df

Another approach using np.argmax

def pir2(df):    slc = (df.values != 0).argmax(axis=1)    return df.values[np.arange(slc.shape[0]), slc]

I know there is a better way to use numpy to slice. I just can't think of it at the moment.

CodeHunter

Efficiently replace values from a column to another column Pandas DataFrame

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last