Pandas sum by groupby, but exclude certain columns Pandas sum by groupby, but exclude certain columns python python

Pandas sum by groupby, but exclude certain columns


You can select the columns of a groupby:

In [11]: df.groupby(['Country', 'Item_Code'])[["Y1961", "Y1962", "Y1963"]].sum()Out[11]:                       Y1961  Y1962  Y1963Country     Item_CodeAfghanistan 15            10     20     30            25            10     20     30Angola      15            30     40     50            25            30     40     50

Note that the list passed must be a subset of the columns otherwise you'll see a KeyError.


The agg function will do this for you. Pass the columns and function as a dict with column, output:

df.groupby(['Country', 'Item_Code']).agg({'Y1961': np.sum, 'Y1962': [np.sum, np.mean]})  # Added example for two output columns from a single input column

This will display only the group by columns, and the specified aggregate columns. In this example I included two agg functions applied to 'Y1962'.

To get exactly what you hoped to see, included the other columns in the group by, and apply sums to the Y variables in the frame:

df.groupby(['Code', 'Country', 'Item_Code', 'Item', 'Ele_Code', 'Unit']).agg({'Y1961': np.sum, 'Y1962': np.sum, 'Y1963': np.sum})


If you are looking for a more generalized way to apply to many columns, what you can do is to build a list of column names and pass it as the index of the grouped dataframe. In your case, for example:

columns = ['Y'+str(i) for year in range(1967, 2011)]df.groupby('Country')[columns].agg('sum')