Efficient date range overlap calculation in python? Efficient date range overlap calculation in python? python python

Efficient date range overlap calculation in python?


  • Determine the latest of the two start dates and the earliest of the two end dates.
  • Compute the timedelta by subtracting them.
  • If the delta is positive, that is the number of days of overlap.

Here is an example calculation:

>>> from datetime import datetime>>> from collections import namedtuple>>> Range = namedtuple('Range', ['start', 'end'])>>> r1 = Range(start=datetime(2012, 1, 15), end=datetime(2012, 5, 10))>>> r2 = Range(start=datetime(2012, 3, 20), end=datetime(2012, 9, 15))>>> latest_start = max(r1.start, r2.start)>>> earliest_end = min(r1.end, r2.end)>>> delta = (earliest_end - latest_start).days + 1>>> overlap = max(0, delta)>>> overlap52


Function calls are more expensive than arithmetic operations.

The fastest way of doing this involves 2 subtractions and 1 min():

min(r1.end - r2.start, r2.end - r1.start).days + 1

compared with the next best which needs 1 subtraction, 1 min() and a max():

(min(r1.end, r2.end) - max(r1.start, r2.start)).days + 1

Of course with both expressions you still need to check for a positive overlap.


I implemented a TimeRange class as you can see below.

The get_overlapped_range first negates all the non overlapped options by a simple condition, and then calculate the overlapped range by considering all the possible options.

To get the amount of days you'll need to take the TimeRange value that was returned from get_overlapped_range and divide the duration by 60*60*24.

class TimeRange(object):    def __init__(self, start, end):        self.start = start        self.end = end        self.duration = self.end - self.start    def is_overlapped(self, time_range):        if max(self.start, time_range.start) < min(self.end, time_range.end):            return True        else:            return False    def get_overlapped_range(self, time_range):        if not self.is_overlapped(time_range):            return        if time_range.start >= self.start:            if self.end >= time_range.end:                return TimeRange(time_range.start, time_range.end)            else:                return TimeRange(time_range.start, self.end)        elif time_range.start < self.start:            if time_range.end >= self.end:                return TimeRange(self.start, self.end)            else:                return TimeRange(self.start, time_range.end)    def __repr__(self):        return '{0} ------> {1}'.format(*[time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(d))                                          for d in [self.start, self.end]])