Including the group name in the apply function pandas python Including the group name in the apply function pandas python python python

Including the group name in the apply function pandas python


I think you should be able to use the nameattribute:

temp_dataframe.groupby(level=0,axis=0).apply(lambda x: foo(x.name, x))

should work, example:

In [132]:df = pd.DataFrame({'a':list('aabccc'), 'b':np.arange(6)})dfOut[132]:   a  b0  a  01  a  12  b  23  c  34  c  45  c  5In [134]:df.groupby('a').apply(lambda x: print('name:', x.name, '\nsubdf:',x))name: a subdf:    a  b0  a  01  a  1name: b subdf:    a  b2  b  2name: c subdf:    a  b3  c  34  c  45  c  5Out[134]:Empty DataFrameColumns: []Index: []


For those who came looking for an answer to the question:

Including the group name in the transform function pandas python

and ended up in this thread, please read on.

Given the following input:

df = pd.DataFrame(data={'col1': list('aabccc'),                        'col2': np.arange(6),                        'col3': np.arange(6)})

Data:

    col1    col2    col30   a       0       01   a       1       12   b       2       23   c       3       34   c       4       45   c       5       5

We can access the group name (which is visible from the scope of the calling apply function) like this:

df.groupby('col1') \.apply(lambda frame: frame \       .transform(lambda col: col + 3 if frame.name == 'a' and col.name == 'col2' else col))

Output:

    col1    col2    col30   a       3       01   a       4       12   b       2       23   c       3       34   c       4       45   c       5       5

Note that the call to apply is needed in order to obtain a reference to the sub pandas.core.frame.DataFrame (i.e. frame) which holds the name attribute of the corresponding sub group. The name attribute of the argument of transform (i.e. col) refers to the column/series name.

Alternatively, one could also loop over the groups and then, within each group, over the columns:

for grp_name, sub_df in df.groupby('col1'):    for col in sub_df:        if grp_name == 'a' and col == 'col2':            df.loc[df.col1 == grp_name, col] = sub_df[col] + 3

My use case is quite rare and this was the only way to achieve my goal (as of pandas v0.24.2). However, I'd recommend exploring the pandas documentation thoroughly because there most likely is an easier vectorised solution to what you may need this construct for.