In Python Pandas, how to use like R dplyr mutate_each
With Pandas, this can be accomplished in a more lenghty way.
First, let's prepare the data:
import pandas as pdimport numpy as npfrom sklearn.datasets import load_irisiris_data = load_iris()iris = pd.DataFrame(iris_data.data, columns = [c[0:3] + c[6] for c in iris_data.feature_names])iris['Species'] = iris_data.target_names[iris_data.target]
Now we can imitate the mutate_each
pipeline:
# calculate the aggregatespivot = iris.groupby("Species")[iris.columns[iris.columns.str.startswith('sepal')] ].aggregate(['min', 'max', np.mean])# name the aggregatespivot.columns = pivot.columns.get_level_values(0) + pivot.columns.get_level_values(1)# merge aggregates with the original dataframenew_iris = iris.merge(pivot, left_on='Species', right_index=True)
The pivot
table is really a small pivot table:
seplmin seplmax seplmean sepwmin sepwmax sepwmeanSpecies setosa 4.3 5.8 5.006 2.3 4.4 3.418versicolor 4.9 7.0 5.936 2.0 3.4 2.770virginica 4.9 7.9 6.588 2.2 3.8 2.974
And the new_iris
is a 150x11 table with all columns from iris
and pivot
combined, identical to what dplyr
outputs.
mutate_each
is superseded by mutate
and across
.
You can try this in python:
>>> from datar.all import f, group_by, starts_with, mutate, across, max, min, mean>>> from datar.datasets import iris>>> >>> iris >> \... group_by(f.Species) >> \... mutate(across(starts_with("Sepal"), [min, max, mean])) Sepal_Length Sepal_Width Petal_Length Petal_Width Species Sepal_Length_1 Sepal_Length_2 Sepal_Length_3 Sepal_Width_1 Sepal_Width_2 Sepal_Width_3 <float64> <float64> <float64> <float64> <object> <float64> <float64> <float64> <float64> <float64> <float64>0 5.1 3.5 1.4 0.2 setosa 4.3 5.8 5.006 2.3 4.4 3.4281 4.9 3.0 1.4 0.2 setosa 4.3 5.8 5.006 2.3 4.4 3.4282 4.7 3.2 1.3 0.2 setosa 4.3 5.8 5.006 2.3 4.4 3.4283 4.6 3.1 1.5 0.2 setosa 4.3 5.8 5.006 2.3 4.4 3.428.. ... ... ... ... ... ... ... ... ... ... ...4 5.0 3.6 1.4 0.2 setosa 4.3 5.8 5.006 2.3 4.4 3.428145 6.7 3.0 5.2 2.3 virginica 4.9 7.9 6.588 2.2 3.8 2.974146 6.3 2.5 5.0 1.9 virginica 4.9 7.9 6.588 2.2 3.8 2.974147 6.5 3.0 5.2 2.0 virginica 4.9 7.9 6.588 2.2 3.8 2.974148 6.2 3.4 5.4 2.3 virginica 4.9 7.9 6.588 2.2 3.8 2.974149 5.9 3.0 5.1 1.8 virginica 4.9 7.9 6.588 2.2 3.8 2.974[Groups: Species (n=3)][150 rows x 11 columns]
I am the author of the datar
package.