How to groupby consecutive values in pandas DataFrame
You can use groupby
by custom Series
:
df = pd.DataFrame({'a': [1, 1, -1, 1, -1, -1]})print (df) a0 11 12 -13 14 -15 -1print ((df.a != df.a.shift()).cumsum())0 11 12 23 34 45 4Name: a, dtype: int32
for i, g in df.groupby([(df.a != df.a.shift()).cumsum()]): print (i) print (g) print (g.a.tolist()) a0 11 1[1, 1]2 a2 -1[-1]3 a3 1[1]4 a4 -15 -1[-1, -1]
Series.diff
is another way to mark the group boundaries (a!=a.shift
means a.diff!=0
):
consecutives = df['a'].diff().ne(0).cumsum()# 0 1# 1 1# 2 2# 3 3# 4 4# 5 4# Name: a, dtype: int64
And to turn these groups into a Series of lists (see the other answers for a list of lists), aggregate with groupby.agg
or groupby.apply
:
df['a'].groupby(consecutives).agg(list)# a# 1 [1, 1]# 2 [-1]# 3 [1]# 4 [-1, -1]# Name: a, dtype: object