In Python Pandas, how to use like R dplyr mutate_each In Python Pandas, how to use like R dplyr mutate_each pandas pandas

In Python Pandas, how to use like R dplyr mutate_each


With Pandas, this can be accomplished in a more lenghty way.

First, let's prepare the data:

import pandas as pdimport numpy as npfrom sklearn.datasets import load_irisiris_data = load_iris()iris = pd.DataFrame(iris_data.data, columns = [c[0:3] + c[6] for c in iris_data.feature_names])iris['Species'] = iris_data.target_names[iris_data.target]

Now we can imitate the mutate_each pipeline:

# calculate the aggregatespivot = iris.groupby("Species")[iris.columns[iris.columns.str.startswith('sepal')]                               ].aggregate(['min', 'max', np.mean])# name the aggregatespivot.columns = pivot.columns.get_level_values(0) + pivot.columns.get_level_values(1)# merge aggregates with the original dataframenew_iris = iris.merge(pivot, left_on='Species', right_index=True)

The pivot table is really a small pivot table:

            seplmin  seplmax  seplmean  sepwmin  sepwmax  sepwmeanSpecies                                                           setosa          4.3      5.8     5.006      2.3      4.4     3.418versicolor      4.9      7.0     5.936      2.0      3.4     2.770virginica       4.9      7.9     6.588      2.2      3.8     2.974

And the new_iris is a 150x11 table with all columns from iris and pivot combined, identical to what dplyr outputs.


mutate_each is superseded by mutate and across.

You can try this in python:

>>> from datar.all import f, group_by, starts_with, mutate, across, max, min, mean>>> from datar.datasets import iris>>> >>> iris >> \...    group_by(f.Species) >> \...    mutate(across(starts_with("Sepal"), [min, max, mean]))     Sepal_Length  Sepal_Width  Petal_Length  Petal_Width    Species  Sepal_Length_1  Sepal_Length_2  Sepal_Length_3  Sepal_Width_1  Sepal_Width_2  Sepal_Width_3        <float64>    <float64>     <float64>    <float64>   <object>       <float64>       <float64>       <float64>      <float64>      <float64>      <float64>0             5.1          3.5           1.4          0.2     setosa             4.3             5.8           5.006            2.3            4.4          3.4281             4.9          3.0           1.4          0.2     setosa             4.3             5.8           5.006            2.3            4.4          3.4282             4.7          3.2           1.3          0.2     setosa             4.3             5.8           5.006            2.3            4.4          3.4283             4.6          3.1           1.5          0.2     setosa             4.3             5.8           5.006            2.3            4.4          3.428..            ...          ...           ...          ...        ...             ...             ...             ...            ...            ...            ...4             5.0          3.6           1.4          0.2     setosa             4.3             5.8           5.006            2.3            4.4          3.428145           6.7          3.0           5.2          2.3  virginica             4.9             7.9           6.588            2.2            3.8          2.974146           6.3          2.5           5.0          1.9  virginica             4.9             7.9           6.588            2.2            3.8          2.974147           6.5          3.0           5.2          2.0  virginica             4.9             7.9           6.588            2.2            3.8          2.974148           6.2          3.4           5.4          2.3  virginica             4.9             7.9           6.588            2.2            3.8          2.974149           5.9          3.0           5.1          1.8  virginica             4.9             7.9           6.588            2.2            3.8          2.974[Groups: Species (n=3)][150 rows x 11 columns]

I am the author of the datar package.