How can I count the number of consecutive TRUEs in a DataFrame? How can I count the number of consecutive TRUEs in a DataFrame? numpy numpy

How can I count the number of consecutive TRUEs in a DataFrame?


Solution should be simplify, if always at least one True per column:

b = df.cumsum()c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)print (c)   A  B  C0  0  1  01  0  0  02  1  1  03  2  2  14  0  3  05  1  4  16  2  0  07  3  0  18  0  1  29  1  0  0#get maximal value of all columnslength = c.max().tolist()print (length)[3, 4, 2]#get indexes by maximal value, subtract length and add 1 index = c.idxmax().sub(length).add(1).tolist()print (index)[5, 2, 7]

Detail:

print (pd.concat([b,                  b.mask(df),                   b.mask(df).ffill(),                   b.mask(df).ffill().fillna(0),                  b.sub(b.mask(df).ffill().fillna(0)).astype(int)                  ], axis=1,                   keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))  cumsum       mask           ffill           fillna           sub             A  B  C    A    B    C     A    B    C      A    B    C   A  B  C0      0  1  0  0.0  NaN  0.0   0.0  NaN  0.0    0.0  0.0  0.0   0  1  01      0  1  0  0.0  1.0  0.0   0.0  1.0  0.0    0.0  1.0  0.0   0  0  02      1  2  0  NaN  NaN  0.0   0.0  1.0  0.0    0.0  1.0  0.0   1  1  03      2  3  1  NaN  NaN  NaN   0.0  1.0  0.0    0.0  1.0  0.0   2  2  14      2  4  1  2.0  NaN  1.0   2.0  1.0  1.0    2.0  1.0  1.0   0  3  05      3  5  2  NaN  NaN  NaN   2.0  1.0  1.0    2.0  1.0  1.0   1  4  16      4  5  2  NaN  5.0  2.0   2.0  5.0  2.0    2.0  5.0  2.0   2  0  07      5  5  3  NaN  5.0  NaN   2.0  5.0  2.0    2.0  5.0  2.0   3  0  18      5  6  4  5.0  NaN  NaN   5.0  5.0  2.0    5.0  5.0  2.0   0  1  29      6  6  4  NaN  6.0  4.0   5.0  6.0  4.0    5.0  6.0  4.0   1  0  0

EDIT:

General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

print (df)       A      B      C0  False   True  False1  False  False  False2   True   True  False3   True   True  False4  False   True  False5   True   True  False6   True  False  False7   True  False  False8  False   True  False9   True  False  Falseb = df.cumsum()c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)mask = df.any()length = np.where(mask, c.max(), -1).tolist()print (length)[3, 4, -1]index =  np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()print (index)[5, 2, 0]


We would basically leverage two philosophies - Catching shifts on compared array and Offsetting each column results so that we could vectorize it.

So, with that intention set, here's one way to achieve the desired results -

def maxisland_start_len_mask(a, fillna_index = -1, fillna_len = 0):    # a is a boolean array    pad = np.zeros(a.shape[1],dtype=bool)    mask = np.vstack((pad, a, pad))    mask_step = mask[1:] != mask[:-1]    idx = np.flatnonzero(mask_step.T)    island_starts = idx[::2]    island_lens = idx[1::2] - idx[::2]    n_islands_percol = mask_step.sum(0)//2    bins = np.repeat(np.arange(a.shape[1]),n_islands_percol)    scale = island_lens.max()+1    scaled_idx = np.argsort(scale*bins + island_lens)    grp_shift_idx = np.r_[0,n_islands_percol.cumsum()]    max_island_starts = island_starts[scaled_idx[grp_shift_idx[1:]-1]]    max_island_percol_start = max_island_starts%(a.shape[0]+1)    valid = n_islands_percol!=0    cut_idx = grp_shift_idx[:-1][valid]    max_island_percol_len = np.maximum.reduceat(island_lens, cut_idx)    out_len = np.full(a.shape[1], fillna_len, dtype=int)    out_len[valid] = max_island_percol_len    out_index = np.where(valid,max_island_percol_start,fillna_index)    return out_index, out_len

Sample run -

# Generic case to handle all 0s columnsIn [112]: aOut[112]: array([[False, False, False],       [False, False, False],       [ True, False, False],       [ True, False,  True],       [False, False, False],       [ True, False,  True],       [ True, False, False],       [ True, False,  True],       [False, False,  True],       [ True, False, False]])In [117]: starts,lens = maxisland_start_len_mask(a, fillna_index=-1, fillna_len=0)In [118]: startsOut[118]: array([ 5, -1,  7])In [119]: lensOut[119]: array([3, 0, 2])