Splitting string into multiple rows in Oracle
This may be an improved way (also with regexp and connect by):
with temp as( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual)select distinct t.name, t.project, trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) as errorfrom temp t, table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levelsorder by name
EDIT:Here is a simple (as in, "not in depth") explanation of the query.
length (regexp_replace(t.error, '[^,]+')) + 1
usesregexp_replace
to erase anything that is not the delimiter (comma in this case) andlength +1
to get how many elements (errors) are there.The
select level from dual connect by level <= (...)
uses a hierarchical query to create a column with an increasing number of matches found, from 1 to the total number of errors.Preview:
select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 as max from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1
table(cast(multiset(.....) as sys.OdciNumberList))
does some casting of oracle types.- The
cast(multiset(.....)) as sys.OdciNumberList
transforms multiple collections (one collection for each row in the original data set) into a single collection of numbers, OdciNumberList. - The
table()
function transforms a collection into a resultset.
- The
FROM
without a join creates a cross join between your dataset and the multiset.As a result, a row in the data set with 4 matches will repeat 4 times (with an increasing number in the column named "column_value").Preview:
select * from temp t,table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels
trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))
uses thecolumn_value
as the nth_appearance/ocurrence parameter forregexp_substr
.- You can add some other columns from your data set (
t.name, t.project
as an example) for easy visualization.
Some references to Oracle docs:
regular expressions is a wonderful thing :)
with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual )SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str FROM (SELECT Name, Project, Error str FROM temp) tCONNECT BY instr(str, ',', 1, level - 1) > 0order by Name
There is a huge difference between the below two:
- splitting a single delimited string
- splitting delimited strings for multiple rows in a table.
If you do not restrict the rows, then the CONNECT BY clause would produce multiple rows and will not give the desired output.
- For single delimited string, look at Split single comma delimited string into rows
- For splitting delimited strings in a table, look at Split comma delimited strings in a table
Apart from Regular Expressions, a few other alternatives are using:
- XMLTable
- MODEL clause
Setup
SQL> CREATE TABLE t ( 2 ID NUMBER GENERATED ALWAYS AS IDENTITY, 3 text VARCHAR2(100) 4 );Table created.SQL>SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');1 row created.SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');1 row created.SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');1 row created.SQL> COMMIT;Commit complete.SQL>SQL> SELECT * FROM t; ID TEXT---------- ---------------------------------------------- 1 word1, word2, word3 2 word4, word5, word6 3 word7, word8, word9SQL>
Using XMLTABLE:
SQL> SELECT id, 2 trim(COLUMN_VALUE) text 3 FROM t, 4 xmltable(('"' 5 || REPLACE(text, ',', '","') 6 || '"')) 7 / ID TEXT---------- ------------------------ 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word99 rows selected.SQL>
Using MODEL clause:
SQL> WITH 2 model_param AS 3 ( 4 SELECT id, 5 text AS orig_str , 6 ',' 7 || text 8 || ',' AS mod_str , 9 1 AS start_pos , 10 Length(text) AS end_pos , 11 (Length(text) - Length(Replace(text, ','))) + 1 AS element_count , 12 0 AS element_no , 13 ROWNUM AS rn 14 FROM t ) 15 SELECT id, 16 trim(Substr(mod_str, start_pos, end_pos-start_pos)) text 17 FROM ( 18 SELECT * 19 FROM model_param MODEL PARTITION BY (id, rn, orig_str, mod_str) 20 DIMENSION BY (element_no) 21 MEASURES (start_pos, end_pos, element_count) 22 RULES ITERATE (2000) 23 UNTIL (ITERATION_NUMBER+1 = element_count[0]) 24 ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1, 25 end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) ) 26 ) 27 WHERE element_no != 0 28 ORDER BY mod_str , 29 element_no 30 / ID TEXT---------- -------------------------------------------------- 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word99 rows selected.SQL>