How to downsampling time series data in pandas?
You can convert your time
series to an actual timedelta
, then use resample
for a vectorized solution:
t = pd.to_timedelta(df.time, unit='T')s = df.set_index(t).groupby('id').resample('3T').last().reset_index(drop=True)s.assign(time=s.groupby('id').cumcount())
id time value0 1 0 51 1 1 162 1 2 203 2 0 84 2 1 105 4 0 6
Use np.r_
and .iloc
with groupby
:
df.groupby('id')['value'].apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]])
Output:
id 1 2 5 5 16 7 202 10 8 11 104 13 6Name: value, dtype: int64
Going a little further with column naming etc..
df_out = df.groupby('id')['value']\ .apply(lambda x: x.iloc[np.r_[2:len(x):3,-1]]).reset_index()df_out.assign(time=df_out.groupby('id').cumcount()).drop('level_1', axis=1)
Output:
id value time0 1 5 01 1 16 12 1 20 23 2 8 04 2 10 15 4 6 0