How can I count the number of consecutive TRUEs in a DataFrame?
Solution should be simplify, if always at least one True
per column:
b = df.cumsum()c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)print (c) A B C0 0 1 01 0 0 02 1 1 03 2 2 14 0 3 05 1 4 16 2 0 07 3 0 18 0 1 29 1 0 0#get maximal value of all columnslength = c.max().tolist()print (length)[3, 4, 2]#get indexes by maximal value, subtract length and add 1 index = c.idxmax().sub(length).add(1).tolist()print (index)[5, 2, 7]
Detail:
print (pd.concat([b, b.mask(df), b.mask(df).ffill(), b.mask(df).ffill().fillna(0), b.sub(b.mask(df).ffill().fillna(0)).astype(int) ], axis=1, keys=('cumsum', 'mask', 'ffill', 'fillna','sub'))) cumsum mask ffill fillna sub A B C A B C A B C A B C A B C0 0 1 0 0.0 NaN 0.0 0.0 NaN 0.0 0.0 0.0 0.0 0 1 01 0 1 0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0 0 02 1 2 0 NaN NaN 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1 1 03 2 3 1 NaN NaN NaN 0.0 1.0 0.0 0.0 1.0 0.0 2 2 14 2 4 1 2.0 NaN 1.0 2.0 1.0 1.0 2.0 1.0 1.0 0 3 05 3 5 2 NaN NaN NaN 2.0 1.0 1.0 2.0 1.0 1.0 1 4 16 4 5 2 NaN 5.0 2.0 2.0 5.0 2.0 2.0 5.0 2.0 2 0 07 5 5 3 NaN 5.0 NaN 2.0 5.0 2.0 2.0 5.0 2.0 3 0 18 5 6 4 5.0 NaN NaN 5.0 5.0 2.0 5.0 5.0 2.0 0 1 29 6 6 4 NaN 6.0 4.0 5.0 6.0 4.0 5.0 6.0 4.0 1 0 0
EDIT:
General solution working with only False
columns - add numpy.where
with boolean mask created by DataFrame.any
:
print (df) A B C0 False True False1 False False False2 True True False3 True True False4 False True False5 True True False6 True False False7 True False False8 False True False9 True False Falseb = df.cumsum()c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)mask = df.any()length = np.where(mask, c.max(), -1).tolist()print (length)[3, 4, -1]index = np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()print (index)[5, 2, 0]
We would basically leverage two philosophies - Catching shifts on compared array
and Offsetting each column results so that we could vectorize it
.
So, with that intention set, here's one way to achieve the desired results -
def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0): # a is a boolean array pad = np.zeros(a.shape[1],dtype=bool) mask = np.vstack((pad, a, pad)) mask_step = mask[1:] != mask[:-1] idx = np.flatnonzero(mask_step.T) island_starts = idx[::2] island_lens = idx[1::2] - idx[::2] n_islands_percol = mask_step.sum(0)//2 bins = np.repeat(np.arange(a.shape[1]),n_islands_percol) scale = island_lens.max()+1 scaled_idx = np.argsort(scale*bins + island_lens) grp_shift_idx = np.r_[0,n_islands_percol.cumsum()] max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]] max_island_percol_start = max_island_starts%(a.shape[0]+1) valid = n_islands_percol!=0 cut_idx = grp_shift_idx[:-1][valid] max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx) out_len = np.full(a.shape[1], fillna_len, dtype=int) out_len[valid] = max_island_percol_len out_index = np.where(valid,max_island_percol_start,fillna_index) return out_index, out_len
Sample run -
# Generic case to handle all 0s columnsIn [112]: aOut[112]: array([[False, False, False], [False, False, False], [ True, False, False], [ True, False, True], [False, False, False], [ True, False, True], [ True, False, False], [ True, False, True], [False, False, True], [ True, False, False]])In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)In [118]: startsOut[118]: array([ 5, -1, 7])In [119]: lensOut[119]: array([3, 0, 2])