How to groupby consecutive values in pandas DataFrame How to groupby consecutive values in pandas DataFrame python python

How to groupby consecutive values in pandas DataFrame


You can use groupby by custom Series:

df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]})print (df)   a0  11  12 -13  14 -15 -1print ((df.a != df.a.shift()).cumsum())0    11    12    23    34    45    4Name: a, dtype: int32
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]):    print (i)    print (g)    print (g.a.tolist())   a0  11  1[1, 1]2   a2 -1[-1]3   a3  1[1]4   a4 -15 -1[-1, -1]


Using groupby from itertools data from Jez

from itertools import groupby[ list(group) for key, group in groupby(df.a.values.tolist())]Out[361]: [[1, 1], [-1], [1], [-1, -1]]


Series.diff is another way to mark the group boundaries (a!=a.shift means a.diff!=0):

consecutives = df['a'].diff().ne(0).cumsum()# 0    1# 1    1# 2    2# 3    3# 4    4# 5    4# Name: a, dtype: int64

And to turn these groups into a Series of lists (see the other answers for a list of lists), aggregate with groupby.agg or groupby.apply:

df['a'].groupby(consecutives).agg(list)# a# 1      [1, 1]# 2        [-1]# 3         [1]# 4    [-1, -1]# Name: a, dtype: object