Splitting string into multiple rows in Oracle Splitting string into multiple rows in Oracle sql sql

Splitting string into multiple rows in Oracle


This may be an improved way (also with regexp and connect by):

with temp as(    select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual    union all    select 109, 'test2', 'Err1' from dual)select distinct  t.name, t.project,  trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value))  as errorfrom   temp t,  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levelsorder by name

EDIT:Here is a simple (as in, "not in depth") explanation of the query.

  1. length (regexp_replace(t.error, '[^,]+')) + 1 uses regexp_replace to erase anything that is not the delimiter (comma in this case) and length +1 to get how many elements (errors) are there.
  2. The select level from dual connect by level <= (...) uses a hierarchical query to create a column with an increasing number of matches found, from 1 to the total number of errors.

    Preview:

    select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1 as max from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+'))  + 1
  3. table(cast(multiset(.....) as sys.OdciNumberList)) does some casting of oracle types.
    • The cast(multiset(.....)) as sys.OdciNumberList transforms multiple collections (one collection for each row in the original data set) into a single collection of numbers, OdciNumberList.
    • The table() function transforms a collection into a resultset.
  4. FROM without a join creates a cross join between your dataset and the multiset.As a result, a row in the data set with 4 matches will repeat 4 times (with an increasing number in the column named "column_value").

    Preview:

    select * from temp t,table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.error, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
  5. trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) uses the column_value as the nth_appearance/ocurrence parameter for regexp_substr.
  6. You can add some other columns from your data set (t.name, t.project as an example) for easy visualization.

Some references to Oracle docs:


regular expressions is a wonderful thing :)

with temp as  (       select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error  from dual       union all       select 109, 'test2', 'Err1' from dual     )SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str  FROM (SELECT Name, Project, Error str FROM temp) tCONNECT BY instr(str, ',', 1, level - 1) > 0order by Name


There is a huge difference between the below two:

  • splitting a single delimited string
  • splitting delimited strings for multiple rows in a table.

If you do not restrict the rows, then the CONNECT BY clause would produce multiple rows and will not give the desired output.

Apart from Regular Expressions, a few other alternatives are using:

  • XMLTable
  • MODEL clause

Setup

SQL> CREATE TABLE t (  2    ID          NUMBER GENERATED ALWAYS AS IDENTITY,  3    text        VARCHAR2(100)  4  );Table created.SQL>SQL> INSERT INTO t (text) VALUES ('word1, word2, word3');1 row created.SQL> INSERT INTO t (text) VALUES ('word4, word5, word6');1 row created.SQL> INSERT INTO t (text) VALUES ('word7, word8, word9');1 row created.SQL> COMMIT;Commit complete.SQL>SQL> SELECT * FROM t;        ID TEXT---------- ----------------------------------------------         1 word1, word2, word3         2 word4, word5, word6         3 word7, word8, word9SQL>

Using XMLTABLE:

SQL> SELECT id,  2         trim(COLUMN_VALUE) text  3  FROM t,  4    xmltable(('"'  5    || REPLACE(text, ',', '","')  6    || '"'))  7  /        ID TEXT---------- ------------------------         1 word1         1 word2         1 word3         2 word4         2 word5         2 word6         3 word7         3 word8         3 word99 rows selected.SQL>

Using MODEL clause:

SQL> WITH  2  model_param AS  3     (  4            SELECT id,  5                      text AS orig_str ,  6                   ','  7                          || text  8                          || ','                                 AS mod_str ,  9                   1                                             AS start_pos , 10                   Length(text)                                   AS end_pos , 11                   (Length(text) - Length(Replace(text, ','))) + 1 AS element_count , 12                   0                                             AS element_no , 13                   ROWNUM                                        AS rn 14            FROM   t ) 15     SELECT   id, 16              trim(Substr(mod_str, start_pos, end_pos-start_pos)) text 17     FROM     ( 18                     SELECT * 19                     FROM   model_param MODEL PARTITION BY (id, rn, orig_str, mod_str) 20                     DIMENSION BY (element_no) 21                     MEASURES (start_pos, end_pos, element_count) 22                     RULES ITERATE (2000) 23                     UNTIL (ITERATION_NUMBER+1 = element_count[0]) 24                     ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1, 25                     end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) ) 26                 ) 27     WHERE    element_no != 0 28     ORDER BY mod_str , 29           element_no 30  /        ID TEXT---------- --------------------------------------------------         1 word1         1 word2         1 word3         2 word4         2 word5         2 word6         3 word7         3 word8         3 word99 rows selected.SQL>