Return multiple columns from pandas apply() Return multiple columns from pandas apply() pandas pandas

Return multiple columns from pandas apply()


You can return a Series from the applied function that contains the new data, preventing the need to iterate three times. Passing axis=1 to the apply function applies the function sizes to each row of the dataframe, returning a series to add to a new dataframe. This series, s, contains the new values, as well as the original data.

def sizes(s):    s['size_kb'] = locale.format("%.1f", s['size'] / 1024.0, grouping=True) + ' KB'    s['size_mb'] = locale.format("%.1f", s['size'] / 1024.0 ** 2, grouping=True) + ' MB'    s['size_gb'] = locale.format("%.1f", s['size'] / 1024.0 ** 3, grouping=True) + ' GB'    return sdf_test = df_test.append(rows_list)df_test = df_test.apply(sizes, axis=1)


Use apply and zip will 3 times fast than Series way.

def sizes(s):        return locale.format("%.1f", s / 1024.0, grouping=True) + ' KB', \        locale.format("%.1f", s / 1024.0 ** 2, grouping=True) + ' MB', \        locale.format("%.1f", s / 1024.0 ** 3, grouping=True) + ' GB'df_test['size_kb'],  df_test['size_mb'], df_test['size_gb'] = zip(*df_test['size'].apply(sizes))

Test result are:

Separate df.apply():     100 loops, best of 3: 1.43 ms per loopReturn Series:     100 loops, best of 3: 2.61 ms per loopReturn tuple:    1000 loops, best of 3: 819 µs per loop


Some of the current replies work fine, but I want to offer another, maybe more "pandifyed" option. This works for me with the current pandas 0.23 (not sure if it will work in previous versions):

import pandas as pddf_test = pd.DataFrame([  {'dir': '/Users/uname1', 'size': 994933},  {'dir': '/Users/uname2', 'size': 109338711},])def sizes(s):  a = locale.format_string("%.1f", s['size'] / 1024.0, grouping=True) + ' KB'  b = locale.format_string("%.1f", s['size'] / 1024.0 ** 2, grouping=True) + ' MB'  c = locale.format_string("%.1f", s['size'] / 1024.0 ** 3, grouping=True) + ' GB'  return a, b, cdf_test[['size_kb', 'size_mb', 'size_gb']] = df_test.apply(sizes, axis=1, result_type="expand")

Notice that the trick is on the result_type parameter of apply, that will expand its result into a DataFrame that can be directly assign to new/old columns.