Finding the mean and standard deviation of a timedelta object in pandas df
You need to convert timedelta
to some numeric value, e.g. int64
by values
what is most accurate, because convert to ns
is what is the numeric representation of timedelta
:
dropped['new'] = dropped['diff'].values.astype(np.int64)means = dropped.groupby('bank').mean()means['new'] = pd.to_timedelta(means['new'])std = dropped.groupby('bank').std()std['new'] = pd.to_timedelta(std['new'])
Another solution is to convert values to seconds
by total_seconds
, but that is less accurate:
dropped['new'] = dropped['diff'].dt.total_seconds()means = dropped.groupby('bank').mean()
No need to convert timedelta
back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped
DataFrame
:
import numpy as npgrouped = dropped.groupby('bank')['diff']mean = grouped.apply(lambda x: np.mean(x))std = grouped.apply(lambda x: np.std(x))