How to loop over grouped Pandas dataframe? How to loop over grouped Pandas dataframe? python python

How to loop over grouped Pandas dataframe?


df.groupby('l_customer_id_i').agg(lambda x: ','.join(x)) does already return a dataframe, so you cannot loop over the groups anymore.

In general:

  • df.groupby(...) returns a GroupBy object (a DataFrameGroupBy or SeriesGroupBy), and with this, you can iterate through the groups (as explained in the docs here). You can do something like:

    grouped = df.groupby('A')for name, group in grouped:    ...
  • When you apply a function on the groupby, in your example df.groupby(...).agg(...) (but this can also be transform, apply, mean, ...), you combine the result of applying the function to the different groups together in one dataframe (the apply and combine step of the 'split-apply-combine' paradigm of groupby). So the result of this will always be again a DataFrame (or a Series depending on the applied function).


Here is an example of iterating over a pd.DataFrame grouped by the column atable. For this sample, "create" statements for an SQL database are generated within the for loop:

import pandas as pddf1 = pd.DataFrame({    'atable':     ['Users', 'Users', 'Domains', 'Domains', 'Locks'],    'column':     ['col_1', 'col_2', 'col_a', 'col_b', 'col'],    'column_type':['varchar', 'varchar', 'int', 'varchar', 'varchar'],    'is_null':    ['No', 'No', 'Yes', 'No', 'Yes'],})df1_grouped = df1.groupby('atable')# iterate over each groupfor group_name, df_group in df1_grouped:    print('\nCREATE TABLE {}('.format(group_name))    for row_index, row in df_group.iterrows():        col = row['column']        column_type = row['column_type']        is_null = 'NOT NULL' if row['is_null'] == 'No' else ''        print('\t{} {} {},'.format(col, column_type, is_null))    print(");")


You can iterate over the index values if your dataframe has already been created.

df = df.groupby('l_customer_id_i').agg(lambda x: ','.join(x))for name in df.index:    print name    print df.loc[name]