Pandas aggregate with dynamic column names Pandas aggregate with dynamic column names python python

Pandas aggregate with dynamic column names


More easy like

df.groupby('group').agg(lambda x : x.head(1) if x.dtype=='object' else x.mean())Out[63]:       group_color      val1      val2group                                A           green  3.333333  4.666667B            blue  4.500000  6.000000


If your group_color is always the same within one group, you can do:

df.pivot_table(index=['group','group_color'],aggfunc='mean')

Output:

                       val1      val2group group_color                    A     green        3.333333  4.666667B     blue         4.500000  6.000000

In the other case, you can build the dictionary and pass it to agg:

agg_dict = {f: 'first' if f=='group_color' else 'mean' for f in df.columns[1:]}df.groupby('group').agg(agg_dict)

Which output:

      group_color      val1      val2group                                A           green  3.333333  4.666667B            blue  4.500000  6.000000


Unfortunately you will have to apply both aggregation functions separately (that or repeat "valn": "mean" as many times as valx columns). Groupby.agg can take a dictionary but the keys must be individual columns.

The way I'd do this is using DataFrame.filter to select the subset of the dataframe with the columns following the format of valx, aggregate with the mean, and then assign new columns with the aggregated results on the other columns:

(df.filter(regex=r'^val').groupby(df.group).mean()   .assign(color = df.group_color.groupby(df.group).first()))         val1      val2    colorgroup                           A      3.333333  4.666667  greenB      4.500000  6.000000   blue