Python Pandas - find consecutive group with max aggregate values Python Pandas - find consecutive group with max aggregate values pandas pandas

Python Pandas - find consecutive group with max aggregate values


You could use np.convolve to get the correct starting index and go from there.

def cons_max(df, N):    max_loc = np.convolve(df.val, np.ones(N, dtype=int), mode='valid').argmax()    return df.loc[max_loc:max_loc+N-1]

Demo

>>> cons_max(df, 3)                   dt  val5 2017-01-01 12:05:00   946 2017-01-01 12:06:00   427 2017-01-01 12:07:00   97>>> cons_max(df, 5)                   dt  val4 2017-01-01 12:04:00   705 2017-01-01 12:05:00   946 2017-01-01 12:06:00   427 2017-01-01 12:07:00   978 2017-01-01 12:08:00   12

This works be effectively "sliding" the kernel (array of ones) across our input and multiply-accumulating the elements in our window of size N together.


You could use rolling/sum and np.nanargmax to find the index associated with the first occurrence of the maximum value:

import numpy as npimport pandas as pddf = pd.DataFrame({'dt': ['2017-01-01 12:00:00', '2017-01-01 12:01:00', '2017-01-01 12:02:00', '2017-01-01 12:03:00', '2017-01-01 12:04:00', '2017-01-01 12:05:00', '2017-01-01 12:06:00', '2017-01-01 12:07:00', '2017-01-01 12:08:00', '2017-01-01 12:09:00', '2017-01-01 12:10:00', '2017-01-01 12:11:00', '2017-01-01 12:12:00', '2017-01-01 12:13:00', '2017-01-01 12:14:00', '2017-01-01 12:15:00'], 'val': [33, 42, 44, 6, 70, 94, 42, 97, 12, 11, 66, 71, 25, 23, 39, 25]})df.index = df.index*10N = 3idx = df['val'].rolling(window=N).sum()i = np.nanargmax(idx) + 1print(df.iloc[i-N : i])

prints

                     dt  val50  2017-01-01 12:05:00   9460  2017-01-01 12:06:00   4270  2017-01-01 12:07:00   97

iloc uses ordinal indexing. loc uses label-based indexing. Provided thatboth i-N and i are valid indices, df.iloc[i-N : i] will grab a window(sub-DataFrame) of length N. In contrast, df.loc[i-N, i] will only grab awindow of length N if the index uses consecutive integers. The example aboveshows a DataFrame where df.loc would not work since df.index hasnon-consecutive integer values.


For simple single values, you can use something like:

df['total'] = df.val + df.val.shift(-1) + df.val.shift(-2)first = df.dropna().sort('total').index[-1]df.iloc[first:first+3]

Not sure how to generalize this... with most things pandas, there is probably an easier way, but this does work.

Edit: After a little more work, it looks like rolling is what you want:

last = df.val.rolling(3).sum().dropna().sort_values().index[-1]

This is a slightly different, in that the index you get here is the end, so after doing the above you want to do

df.iloc[last-2:last+1]

I think that could be generalized.