Finding the mean and standard deviation of a timedelta object in pandas df Finding the mean and standard deviation of a timedelta object in pandas df python python

Finding the mean and standard deviation of a timedelta object in pandas df


You need to convert timedelta to some numeric value, e.g. int64 by values what is most accurate, because convert to ns is what is the numeric representation of timedelta:

dropped['new'] = dropped['diff'].values.astype(np.int64)means = dropped.groupby('bank').mean()means['new'] = pd.to_timedelta(means['new'])std = dropped.groupby('bank').std()std['new'] = pd.to_timedelta(std['new'])

Another solution is to convert values to seconds by total_seconds, but that is less accurate:

dropped['new'] = dropped['diff'].dt.total_seconds()means = dropped.groupby('bank').mean()


Pandas mean() and other aggregation methods support numeric_only=False parameter.

dropped.groupby('bank').mean(numeric_only=False)

Found here: Aggregations for Timedelta values in the Python DataFrame


No need to convert timedelta back and forth. Numpy and pandas can seamlessly do it for you with a faster run time. Using your dropped DataFrame:

import numpy as npgrouped = dropped.groupby('bank')['diff']mean = grouped.apply(lambda x: np.mean(x))std = grouped.apply(lambda x: np.std(x))