pandas' transform doesn't work sorting groupby output pandas' transform doesn't work sorting groupby output python python

pandas' transform doesn't work sorting groupby output


transform is not that well documented, but it seems that the way it works is that what the transform function is passed is not the entire group as a dataframe, but a single column of a single group. I don't think it's really meant for what you're trying to do, and your solution with apply is fine.

So suppose tips.groupby('smoker').transform(func). There will be two groups, call them group1 and group2. The transform does not call func(group1) and func(group2). Instead, it calls func(group1['total_bill']), then func(group1['tip']), etc., and then func(group2['total_bill']), func(group2['tip']). Here's an example:

>>> print d   A  B  C0 -2  5  41  1 -1  22  0  2  13 -3  1  24  5  0  2>>> def foo(df):...     print ">>>"...     print df...     print "<<<"...     return df>>> print d.groupby('C').transform(foo)>>>2    0Name: A<<<>>>2    2Name: B<<<>>>1    13   -34    5Name: A<<<>>>1   -13    14    0Name: B# etc.

You can see that foo is first called with just the A column of the C=1 group of the original data frame, then the B column of that group, then the A column of the C=2 group, etc.

This makes sense if you think about what transform is for. It's meant for applying transform functions on the groups. But in general, these functions won't make sense when applied to the entire group, only to a given column. For instance, the example in the pandas docs is about z-standardizing using transform. If you have a DataFrame with columns for age and weight, it wouldn't make sense to z-standardize with respect to the overall mean of both these variables. It doesn't even mean anything to take the overall mean of a bunch of numbers, some of which are ages and some of which are weights. You have to z-standardize the age with respect to the mean age and the weight with respect to the mean weight, which means you want to transform separately for each column.

So basically, you don't need to use transform here. apply is the appropriate function here, because apply really does operate on each group as a single DataFrame, while transform operates on each column of each group.