Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error python python

Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error


The default aggfunc in pivot_table is np.sum and it doesn't know what to do with strings and you haven't indicated what the index should be properly. Trying something like:

pivot_table = unified_df.pivot_table(index=['id', 'contact_id'],                                     columns='question',                                      values='response_answer',                                     aggfunc=lambda x: ' '.join(x))

This explicitly sets one row per id, contact_id pair and pivots the set of response_answer values on question. The aggfunc just assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_table might vary depending on your pandas version.

Here's a quick example:

In [24]: import pandas as pdIn [25]: import randomIn [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})In [27]: df.head()Out[27]:     id question response0  3100        1      1161  4500        2      1132  5000        1      1203  3900        2      1034  4300        0      117In [28]: df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 100 entries, 0 to 99Data columns (total 3 columns):id          100 non-null int64question    100 non-null objectresponse    100 non-null objectdtypes: int64(1), object(2)memory usage: 3.1+ KBIn [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()Out[29]:question        0        1    2        3id1000      110 120      NaN  100      NaN1100          NaN  106 108  104      NaN1200      104 113      119  NaN      1011300          102      NaN  116  108 1201400          NaN      NaN  116      NaN


There are several ways.

1

df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate(lambda x: x).unstack().reset_index()df1.columns=df1.columns.tolist()print (df1)

2

df1 = df.set_index(["id","contact_id","Network_Name","question"])['response_answer'].unstack().reset_index()df1.columns=df1.columns.tolist()print (df1)

3

df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate('first').unstack().reset_index()df1.columns=df1.columns.tolist()print (df1)

4

df1 = df.pivot_table(index=["id","contact_id","Network_Name"], columns='question', values=['response_answer'], aggfunc='first')df1.columns = df1.columns.droplevel()df1 = df1.reset_index()df1.columns=df1.columns.tolist()print (df1)

Same ans.

    id  contact_id  Network_Name       City State Trip_End_Location0   16      137519          2206       None    Ca              None1   17      137520          2206       None    Ca              None2   18      137521          2206       None    Ca              None3   19      137522          2206       None    Ca              None4   20      137523          2208  Lancaster  None              None5   21      137524          2208  Lancaster  None              None6   22      137525          2208  Lancaster  None              None7   23      137526          2208  Lancaster  None              None8   24      137527          2208       None  None              Home9   25      137528          2208       None  None              Home10  26      137529          2208       None  None              Home11  27      137530          2208       None  None              Home