How to apply a function to two columns of Pandas dataframe
Here's an example using apply
on the dataframe, which I am calling with axis = 1
.
Note the difference is that instead of trying to pass two values to the function f
, rewrite the function to accept a pandas Series object, and then index the Series to get the values needed.
In [49]: dfOut[49]: 0 10 1.000000 0.0000001 -0.494375 0.5709942 1.000000 0.0000003 1.876360 -0.2297384 1.000000 0.000000In [50]: def f(x): ....: return x[0] + x[1] ....: In [51]: df.apply(f, axis=1) #passes a Series object, row-wiseOut[51]: 0 1.0000001 0.0766192 1.0000003 1.6466224 1.000000
Depending on your use case, it is sometimes helpful to create a pandas group
object, and then use apply
on the group.
There is a clean, one-line way of doing this in Pandas:
df['col_3'] = df.apply(lambda x: f(x.col_1, x.col_2), axis=1)
This allows f
to be a user-defined function with multiple input values, and uses (safe) column names rather than (unsafe) numeric indices to access the columns.
Example with data (based on original question):
import pandas as pddf = pd.DataFrame({'ID':['1', '2', '3'], 'col_1': [0, 2, 3], 'col_2':[1, 4, 5]})mylist = ['a', 'b', 'c', 'd', 'e', 'f']def get_sublist(sta,end): return mylist[sta:end+1]df['col_3'] = df.apply(lambda x: get_sublist(x.col_1, x.col_2), axis=1)
Output of print(df)
:
ID col_1 col_2 col_30 1 0 1 [a, b]1 2 2 4 [c, d, e]2 3 3 5 [d, e, f]
If your column names contain spaces or share a name with an existing dataframe attribute, you can index with square brackets:
df['col_3'] = df.apply(lambda x: f(x['col 1'], x['col 2']), axis=1)