Pandas sum time interval in a group excluding overlaps

First, add a column which tracks the latest end time seen so far (but only considering the same group):

df['notbefore'] = df.groupby('group').end.shift().cummax()

It's shifted by 1 so that it reflects the latest end time seen on previous rows, excluding the same row. It's important to have shift() before cummax(), otherwise the shift will "leak" values between groups.

Then add a column containing the "effective" start time:

df['effstart'] = df[['start', 'notbefore']].max(1)

This is the start time modified so that it is not before any previous end time (to avoid overlap).

Then compute the total seconds covered:

df['effsec'] = (df.end - df.effstart).clip(np.timedelta64(0))

df is now:

   id  group               start                 end           notbefore            effstart   effsec0   0      0 2019-10-21 16:20:00 2019-10-21 16:25:00                 NaT 2019-10-21 16:20:00 00:05:001   1      1 2019-10-21 16:22:00 2019-10-21 16:24:00                 NaT 2019-10-21 16:22:00 00:02:002   2      1 2019-10-21 16:22:00 2019-10-21 16:24:00 2019-10-21 16:24:00 2019-10-21 16:24:00 00:00:003   3      2 2019-10-21 16:15:00 2019-10-21 16:18:00                 NaT 2019-10-21 16:15:00 00:03:004   4      2 2019-10-21 16:22:00 2019-10-21 16:26:00 2019-10-21 16:24:00 2019-10-21 16:24:00 00:02:005   5      3 2019-10-21 16:58:00 2019-10-21 17:02:00                 NaT 2019-10-21 16:58:00 00:04:006   6      4 2019-10-21 17:02:00 2019-10-21 17:06:00                 NaT 2019-10-21 17:02:00 00:04:007   7      4 2019-10-21 17:03:00 2019-10-21 17:07:00 2019-10-21 17:06:00 2019-10-21 17:06:00 00:01:008   8      4 2019-10-21 17:04:00 2019-10-21 17:08:00 2019-10-21 17:07:00 2019-10-21 17:07:00 00:01:009   9      4 2019-10-21 17:20:00 2019-10-21 17:22:00 2019-10-21 17:08:00 2019-10-21 17:20:00 00:02:00

To get the final results:

df.groupby('group').effsec.sum()

Which gives you:

group0   00:05:001   00:02:002   00:05:003   00:04:004   00:08:00

python python-3.x pandas pandas-groupby

Use-

def merge_intervals(intervals):    sorted_by_lower_bound = sorted(intervals, key=lambda tup: tup[0])    merged = []    for higher in sorted_by_lower_bound:        if not merged:            merged.append(higher)        else:            lower = merged[-1]            # test for intersection between lower and higher:            # we know via sorting that lower[0] <= higher[0]            if higher[0] <= lower[1]:                upper_bound = max(lower[1], higher[1])                merged[-1] = (lower[0], upper_bound)  # replace by merged interval            else:                merged.append(higher)    return mergeddf['dt'] = df[['start', 'end']].apply(tuple, axis=1)op = df.groupby(['group'])['dt'].apply(list)f_op = op.apply(merge_intervals)op_d = f_op.apply(lambda x: sum([(y[1]-y[0]).seconds for y in x]))

Output

group0    3001    1202    4203    2404    480

python python-3.x pandas pandas-groupby

As the source data I took the following DataFrame:

  group             start               end0    G1  2019-09-01 12:00  2019-09-01 12:021    G1  2019-09-01 12:01  2019-09-01 12:042    G1  2019-09-01 12:07  2019-09-01 12:103    G2  2019-09-01 12:05  2019-09-01 12:124    G2  2019-09-01 12:10  2019-09-01 12:15

The first step is to define a function counting seconds within a group of rows:

def getSecs(grp):    return pd.DatetimeIndex([]).union_many([ pd.date_range(        row.start, row.end, freq='s', closed='left')            for _, row in grp.iterrows() ]).size

Then apply this function to each group, grouping by group:

secs = df.groupby('group').apply(getSecs).rename('secs')

For my test data, the result is:

groupG1    420G2    600Name: secs, dtype: int64

And the last step is to create a new column in df by merging with secs:

df = df.merge(secs, left_on='Grp', right_index=True)

The result is:

  group             start               end  secs0    G1  2019-09-01 12:00  2019-09-01 12:02   4201    G1  2019-09-01 12:01  2019-09-01 12:04   4202    G1  2019-09-01 12:07  2019-09-01 12:10   4203    G2  2019-09-01 12:05  2019-09-01 12:12   6004    G2  2019-09-01 12:10  2019-09-01 12:15   600

A quite concise solution, just 6 lines of code, substantially less thansome other solutions.

Note also that only my solution creates a new column, with equalvalues for all rows within each group (one of other solution failed onthis detail).All other solutions stopped at computing numbers of seconds for each group.

CodeHunter

Pandas sum time interval in a group excluding overlaps

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last