Pandas alternative to apply - to create new column based on multiple columns

I see a reasonable performance improvement by using .loc rather than chained indexing:

import random, pandas as pd, numpy as npdf = pd.DataFrame([[4,5,19],[1,2,0],[2,5,9],[8,2,5]], columns=['a','b','c'])df = pd.concat([df]*1000000)x = df.sample(n=2)def get_new(row):    a, b, c = row    return random.choice(df[(df['a'] != a) & (df['b'] == b) & (df['c'] != c)]['c'].values)def get_new2(row):    a, b, c = row    return random.choice(df.loc[(df['a'] != a) & (df['b'] == b) & (df['c'] != c), 'c'].values)%timeit x.apply(lambda row: get_new(row), axis=1)   # 159ms%timeit x.apply(lambda row: get_new2(row), axis=1)  # 119ms

CodeHunter

Pandas alternative to apply - to create new column based on multiple columns

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last