Converting a Pandas GroupBy output from Series to DataFrame Converting a Pandas GroupBy output from Series to DataFrame python python

Converting a Pandas GroupBy output from Series to DataFrame


g1 here is a DataFrame. It has a hierarchical index, though:

In [19]: type(g1)Out[19]: pandas.core.frame.DataFrameIn [20]: g1.indexOut[20]: MultiIndex([('Alice', 'Seattle'), ('Bob', 'Seattle'), ('Mallory', 'Portland'),       ('Mallory', 'Seattle')], dtype=object)

Perhaps you want something like this?

In [21]: g1.add_suffix('_Count').reset_index()Out[21]:       Name      City  City_Count  Name_Count0    Alice   Seattle           1           11      Bob   Seattle           2           22  Mallory  Portland           2           23  Mallory   Seattle           1           1

Or something like:

In [36]: DataFrame({'count' : df1.groupby( [ "Name", "City"] ).size()}).reset_index()Out[36]:       Name      City  count0    Alice   Seattle      11      Bob   Seattle      22  Mallory  Portland      23  Mallory   Seattle      1


I want to slightly change the answer given by Wes, because version 0.16.2 requires as_index=False. If you don't set it, you get an empty dataframe.

Source:

Aggregation functions will not return the groups that you are aggregating over if they are named columns, when as_index=True, the default. The grouped columns will be the indices of the returned object.

Passing as_index=False will return the groups that you are aggregating over, if they are named columns.

Aggregating functions are ones that reduce the dimension of the returned objects, for example: mean, sum, size, count, std, var, sem, describe, first, last, nth, min, max. This is what happens when you do for example DataFrame.sum() and get back a Series.

nth can act as a reducer or a filter, see here.

import pandas as pddf1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],                    "City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]})print df1##       City     Name#0   Seattle    Alice#1   Seattle      Bob#2  Portland  Mallory#3   Seattle  Mallory#4   Seattle      Bob#5  Portland  Mallory#g1 = df1.groupby(["Name", "City"], as_index=False).count()print g1##                  City  Name#Name    City#Alice   Seattle      1     1#Bob     Seattle      2     2#Mallory Portland     2     2#        Seattle      1     1#

EDIT:

In version 0.17.1 and later you can use subset in count and reset_index with parameter name in size:

print df1.groupby(["Name", "City"], as_index=False ).count()#IndexError: list index out of rangeprint df1.groupby(["Name", "City"]).count()#Empty DataFrame#Columns: []#Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)]print df1.groupby(["Name", "City"])[['Name','City']].count()#                  Name  City#Name    City                #Alice   Seattle      1     1#Bob     Seattle      2     2#Mallory Portland     2     2#        Seattle      1     1print df1.groupby(["Name", "City"]).size().reset_index(name='count')#      Name      City  count#0    Alice   Seattle      1#1      Bob   Seattle      2#2  Mallory  Portland      2#3  Mallory   Seattle      1

The difference between count and size is that size counts NaN values while count does not.


The key is to use the reset_index() method.

Use:

import pandasdf1 = pandas.DataFrame( {     "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,     "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"] } )g1 = df1.groupby( [ "Name", "City"] ).count().reset_index()

Now you have your new dataframe in g1:

result dataframe