Pandas dataframe groupby to calculate population standard deviation
You can pass additional args to np.std
in the agg
function:
In [202]:df.groupby('A').agg(np.std, ddof=0)Out[202]: B valuesA 1 0.5 2.52 0.5 2.5In [203]:df.groupby('A').agg(np.std, ddof=1)Out[203]: B valuesA 1 0.707107 3.5355342 0.707107 3.535534
For degree of freedom = 0
(This means that bins with one number will end up with std=0
instead of NaN
)
import numpy as npdef std(x): return np.std(x)df.groupby('A').agg(['mean', 'max', std])