Aggregate Overlapping Segments to Measure Effective Length Aggregate Overlapping Segments to Measure Effective Length oracle oracle

Aggregate Overlapping Segments to Measure Effective Length


My main DBMS is Teradata, but this will work as-is in Oracle, too.

WITH all_meas AS ( -- get a distinct list of all from/to points   SELECT road_id, from_meas AS meas   FROM road_events   UNION   SELECT road_id, to_meas   FROM road_events )-- select * from all_meas order by 1,2 , all_ranges AS ( -- create from/to ranges   SELECT road_id, meas AS from_meas      ,Lead(meas)      Over (PARTITION BY road_id            ORDER BY meas) AS to_meas   FROM all_meas  ) -- SELECT * from all_ranges order by 1,2, all_event_ranges AS ( -- now match the ranges to the event ranges   SELECT       ar.*     ,re.event_id     ,re.year     ,re.total_road_length     ,ar.to_meas - ar.from_meas AS event_length     -- used to filter the latest event as multiple events might cover the same range      ,Row_Number()      Over (PARTITION BY ar.road_id, ar.from_meas            ORDER BY year DESC) AS rn   FROM all_ranges ar   JOIN road_events re     ON ar.road_id = re.road_id    AND ar.from_meas < re.to_meas    AND ar.to_meas > re.from_meas   WHERE ar.to_meas IS NOT NULL )SELECT event_id, road_id, year, total_road_length, Sum(event_length)FROM all_event_rangesWHERE rn = 1 -- latest year onlyGROUP BY event_id, road_id, year, total_road_lengthORDER BY road_id, year DESC;

If you need to return the actual covered from/to_meas (as in your question before edit), it might be more complicated. The first part is the same, but without aggregation the query can return adjacent rows with the same event_id (e.g. for event 3: 0-1 & 1-25):

SELECT * FROM all_event_rangesWHERE rn = 1ORDER BY road_id, from_meas;

If you want to merge adjacent rows you need two more steps (using a standard approach, flag the 1st row of a group and calculate a group number):

WITH all_meas AS (   SELECT road_id, from_meas AS meas   FROM road_events   UNION   SELECT road_id, to_meas   FROM road_events )-- select * from all_meas order by 1,2 , all_ranges AS (    SELECT road_id, meas AS from_meas      ,Lead(meas)      Over (PARTITION BY road_id            ORDER BY meas) AS to_meas   FROM all_meas  )-- SELECT * from all_ranges order by 1,2, all_event_ranges AS (   SELECT       ar.*     ,re.event_id     ,re.year     ,re.total_road_length     ,ar.to_meas - ar.from_meas AS event_length     ,Row_Number()      Over (PARTITION BY ar.road_id, ar.from_meas            ORDER BY year DESC) AS rn   FROM all_ranges ar   JOIN road_events  re     ON ar.road_id = re.road_id    AND ar.from_meas < re.to_meas    AND ar.to_meas > re.from_meas   WHERE ar.to_meas IS NOT NULL )-- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas, adjacent_events AS  ( -- assign 1 to the 1st row of an event   SELECT t.*     ,CASE WHEN Lag(event_id)                Over(PARTITION BY road_id                     ORDER BY from_meas) = event_id           THEN 0            ELSE 1       END AS flag   FROM all_event_ranges t   WHERE rn = 1 )-- SELECT * FROM adjacent_events ORDER BY road_id, from_meas , grouped_events AS ( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1   SELECT t.*     ,Sum(flag)      Over (PARTITION BY road_id            ORDER BY from_meas            ROWS Unbounded Preceding) AS grp   FROM adjacent_events t)-- SELECT * FROM grouped_events ORDER BY  road_id, from_measSELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length)FROM grouped_eventsGROUP BY event_id, road_id, grp, year, total_road_lengthORDER BY 2, Min(from_meas);

Edit:

Ups, I just found a blog Overlapping ranges with priority doing exactly the same with some simplified Oracle syntax. In fact I translated my query from a some other simplified syntax in Teradata to Standard/Oracle SQL :-)


There is another way to calculate this, with from and to values:

with   part_begin_point as (    Select distinct road_id, from_meas point    from road_events be    union     Select distinct road_id, to_meas point    from road_events ee  ), newest_part as (  select e.event_id  , e.road_id  , e.year  , e.total_road_length  , p.point  , LAG(e.event_id) over (partition by p.road_id order by p.point) prev_event  , e.to_meas event_to_meas  from part_begin_point p  join road_events e   on p.road_id = e.road_id   and p.point >= e.from_meas and  p.point < e.to_meas   and not exists(        select 1 from road_events ne         where e.road_id = ne.road_id        and p.point >= ne.from_meas and p.point < ne.to_meas        and (e.year < ne.year or e.year = ne.year and e.event_id < ne.event_id))  )select event_id, road_id, year, point from_meas, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) to_meas, total_road_length, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) - point EVENT_LENGTHfrom newest_partwhere 1=1and event_id <> prev_event or prev_event is nullorder by event_id, point

SQL Fiddle


Thought about this too much today, but I have something that ignores the +/- 10 meters now.

First made a function that takes in to / from pairs as a string and returns the distance covered by the pairs in the string. For example '10:20;35:45' returns 20.

CREATE    OR replace FUNCTION get_distance_range_str (strRangeStr VARCHAR2)RETURN NUMBER IS intRetNum NUMBER;BEGIN    --split input string    WITH cte_1    AS (        SELECT regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) AS TO_FROM_STRING        FROM dual connect BY regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) IS NOT NULL        )        --split From/To pairs        ,cte_2    AS (        SELECT cte_1.TO_FROM_STRING            ,to_number(substr(cte_1.TO_FROM_STRING, 1, instr(cte_1.TO_FROM_STRING, ':') - 1)) AS FROM_MEAS            ,to_number(substr(cte_1.TO_FROM_STRING, instr(cte_1.TO_FROM_STRING, ':') + 1, length(cte_1.TO_FROM_STRING) - instr(cte_1.TO_FROM_STRING, ':'))) AS TO_MEAS        FROM cte_1        )        --merge ranges        ,cte_merge_ranges    AS (        SELECT s1.FROM_MEAS            ,            --t1.TO_MEAS             MIN(t1.TO_MEAS) AS TO_MEAS        FROM cte_2 s1        INNER JOIN cte_2 t1 ON s1.FROM_MEAS <= t1.TO_MEAS            AND NOT EXISTS (                SELECT *                FROM cte_2 t2                WHERE t1.TO_MEAS >= t2.FROM_MEAS                    AND t1.TO_MEAS < t2.TO_MEAS                )        WHERE NOT EXISTS (                SELECT *                FROM cte_2 s2                WHERE s1.FROM_MEAS > s2.FROM_MEAS                    AND s1.FROM_MEAS <= s2.TO_MEAS                )        GROUP BY s1.FROM_MEAS        )    SELECT sum(TO_MEAS - FROM_MEAS) AS DISTANCE_COVERED    INTO intRetNum    FROM cte_merge_ranges;    RETURN intRetNum;END;

Then wrote this query that builds a string for that function for the appropriate prior range. Couldn't use windowing with list_agg, but was able to achieve same with a correlated subquery.

--use list agg to create list of to/from pairs for rows before current row in the orderingWITH cte_2AS (    SELECT T1.*        ,(            SELECT LISTAGG(FROM_MEAS || ':' || TO_MEAS || ';') WITHIN            GROUP (                    ORDER BY ORDER BY YEAR DESC, EVENT_ID DESC                    )            FROM road_events T2            WHERE T1.YEAR || lpad(T1.EVENT_ID, 10,'0') <                 T2.YEAR || lpad(T2.EVENT_ID, 10,'0')                AND T1.ROAD_ID = T2.ROAD_ID            GROUP BY road_id            ) AS PRIOR_RANGES_STR    FROM road_events T1    )    --get distance for prior range string - distance ignoring current row    --get distance including current row    ,cte_3AS (    SELECT cte_2.*        ,coalesce(get_distance_range_str(PRIOR_RANGES_STR), 0) AS DIST_PRIOR        ,get_distance_range_str(PRIOR_RANGES_STR || FROM_MEAS || ':' || TO_MEAS || ';') AS DIST_NOW    FROM cte_2 cte_2    )    --distance including current row less distance ignoring current row is distance added to the range this row    ,cte_4AS (    SELECT cte_3.*        ,DIST_NOW - DIST_PRIOR AS DIST_ADDED_THIS_ROW    FROM cte_3    )SELECT *FROM cte_4--filter out any rows with distance added as 0WHERE DIST_ADDED_THIS_ROW > 0ORDER BY ROAD_ID, YEAR DESC, EVENT_ID DESC

sqlfiddle here: http://sqlfiddle.com/#!4/81331/36

Looks to me like the results match yours. I left the additional columns in the final query to try to illustrate each step.

Works on the test case - might need some work to handle all possibilities in a larger data set, but I think this would be a good place to start and refine.

Credit for Overlapping range merge is first answer here: Merge overlapping date intervals

Credit for list_agg with windowing is first answer here:LISTAGG equivalent with windowing clause