Simple cross-tabulation in pandas

v0.21 answer

Use pivot_table with the index parameter:

df.pivot_table(index='category', aggfunc=[len, sum])           len   sum         value valuecategory            AB           2   300AC           1   150AD           1   500

<= v0.12

It is possible to do this using pivot_table for those interested:

In [8]: dfOut[8]:   category  value0       AB    1001       AB    2002       AC    1503       AD    500In [9]: df.pivot_table(rows='category', aggfunc=[len, np.sum])Out[9]:             len    sum          value  valuecategory              AB            2    300AC            1    150AD            1    500

Note that the result's columns are hierarchically indexed. If you had multiple data columns, you would get a result like this:

In [12]: dfOut[12]:   category  value  value20       AB    100       51       AB    200       52       AC    150       53       AD    500       5In [13]: df.pivot_table(rows='category', aggfunc=[len, np.sum])Out[13]:             len            sum                  value  value2  value  value2category                              AB            2       2    300      10AC            1       1    150       5AD            1       1    500       5

The main reason to use __builtin__.sum vs. np.sum is that you get NA-handling from the latter. Probably could intercept the Python built-in, will make a note about that now.

python pandas dataframe pivot-table

Assuming that you have a file called 2010.csv with contents

category,valueAB,100.00AB,200.00AC,150.00AD,500.00

Then, using the ability to apply multiple aggregation functions following a groupby, you can say:

import pandasdata_2010 = pandas.read_csv("/path/to/2010.csv")data_2010.groupby("category").agg([len, sum])

You should get a result that looks something like

          value                 len  sumcategory            AB            2  300AC            1  150AD            1  500

Note that Wes will likely come by to point out that sum is optimized and that you should probably use np.sum.

CodeHunter

Simple cross-tabulation in pandas

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last