How to count consecutive ordered values on pandas data frame
Here is one way we need to create the additional key for groupby
then , just need groupby
this key and id
s=df.groupby('id').value.apply(lambda x : x.ne(0).cumsum())df[df.value==0].groupby([df.id,s]).size().max(level=0).reindex(df.id.unique(),fill_value=0)Out[267]: id354 3357 2540 0dtype: int64
Create groupID m
for consecutive rows of same value. Next, groupby
on id
and m
and call value_counts
, and .loc
on multiindex to slice only 0
value of the right-most index level. Finally, filter out duplicates index by duplicated
in id
and reindex to create 0 value for id
having no 0
count
m = df.value.diff().ne(0).cumsum().rename('gid') #Consecutive rows having the same value will be assigned same IDNumber by this command. #It is the way to identify a group of consecutive rows having the same value, so I called it groupID.df1 = df.groupby(['id', m]).value.value_counts().loc[:,:,0].droplevel(-1)#this groupby groups consecutive rows of same value per ID into separate groups.#within each group, count number of each value and `.loc` to pick specifically only `0` because we only concern on the count of value `0`.df1[~df1.index.duplicated()].reindex(df.id.unique(), fill_value=0)#There're several groups of value `0` per `id`. We want only group of highest count. #`value_count` already sorted number of count descending, so we just need to pick #the top one of duplicates by slicing on True/False mask of `duplicated`.#finally, `reindex` adding any `id` doesn't have value 0 in original `df`.#Note: `id` is the column `id` in `df`. It is different from groupID `m` we create to use with groupbyOut[315]:id354 3357 2540 0Name: value, dtype: int64