Pivoting a Pandas Dataframe containing strings - 'No numeric types to aggregate' error
The default aggfunc
in pivot_table
is np.sum
and it doesn't know what to do with strings and you haven't indicated what the index should be properly. Trying something like:
pivot_table = unified_df.pivot_table(index=['id', 'contact_id'], columns='question', values='response_answer', aggfunc=lambda x: ' '.join(x))
This explicitly sets one row per id, contact_id
pair and pivots the set of response_answer
values on question
. The aggfunc
just assures that if you have multiple answers to the same question in the raw data that we just concatenate them together with spaces. The syntax of pivot_table
might vary depending on your pandas version.
Here's a quick example:
In [24]: import pandas as pdIn [25]: import randomIn [26]: df = pd.DataFrame({'id':[100*random.randint(10, 50) for _ in range(100)], 'question': [str(random.randint(0,3)) for _ in range(100)], 'response': [str(random.randint(100,120)) for _ in range(100)]})In [27]: df.head()Out[27]: id question response0 3100 1 1161 4500 2 1132 5000 1 1203 3900 2 1034 4300 0 117In [28]: df.info()<class 'pandas.core.frame.DataFrame'>Int64Index: 100 entries, 0 to 99Data columns (total 3 columns):id 100 non-null int64question 100 non-null objectresponse 100 non-null objectdtypes: int64(1), object(2)memory usage: 3.1+ KBIn [29]: df.pivot_table(index='id', columns='question', values='response', aggfunc=lambda x: ' '.join(x)).head()Out[29]:question 0 1 2 3id1000 110 120 NaN 100 NaN1100 NaN 106 108 104 NaN1200 104 113 119 NaN 1011300 102 NaN 116 108 1201400 NaN NaN 116 NaN
There are several ways.
1
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate(lambda x: x).unstack().reset_index()df1.columns=df1.columns.tolist()print (df1)
2
df1 = df.set_index(["id","contact_id","Network_Name","question"])['response_answer'].unstack().reset_index()df1.columns=df1.columns.tolist()print (df1)
3
df1 = df.groupby(["id","contact_id","Network_Name","question"])['response_answer'].aggregate('first').unstack().reset_index()df1.columns=df1.columns.tolist()print (df1)
4
df1 = df.pivot_table(index=["id","contact_id","Network_Name"], columns='question', values=['response_answer'], aggfunc='first')df1.columns = df1.columns.droplevel()df1 = df1.reset_index()df1.columns=df1.columns.tolist()print (df1)
Same ans.
id contact_id Network_Name City State Trip_End_Location0 16 137519 2206 None Ca None1 17 137520 2206 None Ca None2 18 137521 2206 None Ca None3 19 137522 2206 None Ca None4 20 137523 2208 Lancaster None None5 21 137524 2208 Lancaster None None6 22 137525 2208 Lancaster None None7 23 137526 2208 Lancaster None None8 24 137527 2208 None None Home9 25 137528 2208 None None Home10 26 137529 2208 None None Home11 27 137530 2208 None None Home