Aggregate Overlapping Segments to Measure Effective Length
My main DBMS is Teradata, but this will work as-is in Oracle, too.
WITH all_meas AS ( -- get a distinct list of all from/to points SELECT road_id, from_meas AS meas FROM road_events UNION SELECT road_id, to_meas FROM road_events )-- select * from all_meas order by 1,2 , all_ranges AS ( -- create from/to ranges SELECT road_id, meas AS from_meas ,Lead(meas) Over (PARTITION BY road_id ORDER BY meas) AS to_meas FROM all_meas ) -- SELECT * from all_ranges order by 1,2, all_event_ranges AS ( -- now match the ranges to the event ranges SELECT ar.* ,re.event_id ,re.year ,re.total_road_length ,ar.to_meas - ar.from_meas AS event_length -- used to filter the latest event as multiple events might cover the same range ,Row_Number() Over (PARTITION BY ar.road_id, ar.from_meas ORDER BY year DESC) AS rn FROM all_ranges ar JOIN road_events re ON ar.road_id = re.road_id AND ar.from_meas < re.to_meas AND ar.to_meas > re.from_meas WHERE ar.to_meas IS NOT NULL )SELECT event_id, road_id, year, total_road_length, Sum(event_length)FROM all_event_rangesWHERE rn = 1 -- latest year onlyGROUP BY event_id, road_id, year, total_road_lengthORDER BY road_id, year DESC;
If you need to return the actual covered from/to_meas
(as in your question before edit), it might be more complicated. The first part is the same, but without aggregation the query can return adjacent rows with the same event_id (e.g. for event 3: 0-1 & 1-25):
SELECT * FROM all_event_rangesWHERE rn = 1ORDER BY road_id, from_meas;
If you want to merge adjacent rows you need two more steps (using a standard approach, flag the 1st row of a group and calculate a group number):
WITH all_meas AS ( SELECT road_id, from_meas AS meas FROM road_events UNION SELECT road_id, to_meas FROM road_events )-- select * from all_meas order by 1,2 , all_ranges AS ( SELECT road_id, meas AS from_meas ,Lead(meas) Over (PARTITION BY road_id ORDER BY meas) AS to_meas FROM all_meas )-- SELECT * from all_ranges order by 1,2, all_event_ranges AS ( SELECT ar.* ,re.event_id ,re.year ,re.total_road_length ,ar.to_meas - ar.from_meas AS event_length ,Row_Number() Over (PARTITION BY ar.road_id, ar.from_meas ORDER BY year DESC) AS rn FROM all_ranges ar JOIN road_events re ON ar.road_id = re.road_id AND ar.from_meas < re.to_meas AND ar.to_meas > re.from_meas WHERE ar.to_meas IS NOT NULL )-- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas, adjacent_events AS ( -- assign 1 to the 1st row of an event SELECT t.* ,CASE WHEN Lag(event_id) Over(PARTITION BY road_id ORDER BY from_meas) = event_id THEN 0 ELSE 1 END AS flag FROM all_event_ranges t WHERE rn = 1 )-- SELECT * FROM adjacent_events ORDER BY road_id, from_meas , grouped_events AS ( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1 SELECT t.* ,Sum(flag) Over (PARTITION BY road_id ORDER BY from_meas ROWS Unbounded Preceding) AS grp FROM adjacent_events t)-- SELECT * FROM grouped_events ORDER BY road_id, from_measSELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length)FROM grouped_eventsGROUP BY event_id, road_id, grp, year, total_road_lengthORDER BY 2, Min(from_meas);
Edit:
Ups, I just found a blog Overlapping ranges with priority doing exactly the same with some simplified Oracle syntax. In fact I translated my query from a some other simplified syntax in Teradata to Standard/Oracle SQL :-)
There is another way to calculate this, with from and to values:
with part_begin_point as ( Select distinct road_id, from_meas point from road_events be union Select distinct road_id, to_meas point from road_events ee ), newest_part as ( select e.event_id , e.road_id , e.year , e.total_road_length , p.point , LAG(e.event_id) over (partition by p.road_id order by p.point) prev_event , e.to_meas event_to_meas from part_begin_point p join road_events e on p.road_id = e.road_id and p.point >= e.from_meas and p.point < e.to_meas and not exists( select 1 from road_events ne where e.road_id = ne.road_id and p.point >= ne.from_meas and p.point < ne.to_meas and (e.year < ne.year or e.year = ne.year and e.event_id < ne.event_id)) )select event_id, road_id, year, point from_meas, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) to_meas, total_road_length, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) - point EVENT_LENGTHfrom newest_partwhere 1=1and event_id <> prev_event or prev_event is nullorder by event_id, point
Thought about this too much today, but I have something that ignores the +/- 10 meters now.
First made a function that takes in to / from pairs as a string and returns the distance covered by the pairs in the string. For example '10:20;35:45' returns 20.
CREATE OR replace FUNCTION get_distance_range_str (strRangeStr VARCHAR2)RETURN NUMBER IS intRetNum NUMBER;BEGIN --split input string WITH cte_1 AS ( SELECT regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) AS TO_FROM_STRING FROM dual connect BY regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) IS NOT NULL ) --split From/To pairs ,cte_2 AS ( SELECT cte_1.TO_FROM_STRING ,to_number(substr(cte_1.TO_FROM_STRING, 1, instr(cte_1.TO_FROM_STRING, ':') - 1)) AS FROM_MEAS ,to_number(substr(cte_1.TO_FROM_STRING, instr(cte_1.TO_FROM_STRING, ':') + 1, length(cte_1.TO_FROM_STRING) - instr(cte_1.TO_FROM_STRING, ':'))) AS TO_MEAS FROM cte_1 ) --merge ranges ,cte_merge_ranges AS ( SELECT s1.FROM_MEAS , --t1.TO_MEAS MIN(t1.TO_MEAS) AS TO_MEAS FROM cte_2 s1 INNER JOIN cte_2 t1 ON s1.FROM_MEAS <= t1.TO_MEAS AND NOT EXISTS ( SELECT * FROM cte_2 t2 WHERE t1.TO_MEAS >= t2.FROM_MEAS AND t1.TO_MEAS < t2.TO_MEAS ) WHERE NOT EXISTS ( SELECT * FROM cte_2 s2 WHERE s1.FROM_MEAS > s2.FROM_MEAS AND s1.FROM_MEAS <= s2.TO_MEAS ) GROUP BY s1.FROM_MEAS ) SELECT sum(TO_MEAS - FROM_MEAS) AS DISTANCE_COVERED INTO intRetNum FROM cte_merge_ranges; RETURN intRetNum;END;
Then wrote this query that builds a string for that function for the appropriate prior range. Couldn't use windowing with list_agg, but was able to achieve same with a correlated subquery.
--use list agg to create list of to/from pairs for rows before current row in the orderingWITH cte_2AS ( SELECT T1.* ,( SELECT LISTAGG(FROM_MEAS || ':' || TO_MEAS || ';') WITHIN GROUP ( ORDER BY ORDER BY YEAR DESC, EVENT_ID DESC ) FROM road_events T2 WHERE T1.YEAR || lpad(T1.EVENT_ID, 10,'0') < T2.YEAR || lpad(T2.EVENT_ID, 10,'0') AND T1.ROAD_ID = T2.ROAD_ID GROUP BY road_id ) AS PRIOR_RANGES_STR FROM road_events T1 ) --get distance for prior range string - distance ignoring current row --get distance including current row ,cte_3AS ( SELECT cte_2.* ,coalesce(get_distance_range_str(PRIOR_RANGES_STR), 0) AS DIST_PRIOR ,get_distance_range_str(PRIOR_RANGES_STR || FROM_MEAS || ':' || TO_MEAS || ';') AS DIST_NOW FROM cte_2 cte_2 ) --distance including current row less distance ignoring current row is distance added to the range this row ,cte_4AS ( SELECT cte_3.* ,DIST_NOW - DIST_PRIOR AS DIST_ADDED_THIS_ROW FROM cte_3 )SELECT *FROM cte_4--filter out any rows with distance added as 0WHERE DIST_ADDED_THIS_ROW > 0ORDER BY ROAD_ID, YEAR DESC, EVENT_ID DESC
sqlfiddle here: http://sqlfiddle.com/#!4/81331/36
Looks to me like the results match yours. I left the additional columns in the final query to try to illustrate each step.
Works on the test case - might need some work to handle all possibilities in a larger data set, but I think this would be a good place to start and refine.
Credit for Overlapping range merge is first answer here: Merge overlapping date intervals
Credit for list_agg with windowing is first answer here:LISTAGG equivalent with windowing clause