This tutorial shows how to extract comma separated values (CSV), pipe delimited values,
or tab delimited values from strings like these
(on this page the
· symbol indicates a single TAB character, i.e. ASCII code 9)
DATA_SET STRING ------------------------- -------------------- CSV a,bb,c c c Pipe Delimited a|b,b|c c c Tab Delimited a·b,b·c c c
and return the values in separate columns like this.
DATA_SET VAL_1 VAL_2 VAL_3 ------------------------- ------ ------ ------ CSV a bb c c c Pipe Delimited a b,b c c c Tab Delimited a b,b c c c
The solution in this tutorial uses features introduced in Oracle 10g. For Oracle databases running versions earlier than 10g see the Separated Values tutorial.
The solution for Oracle 10g+ uses the REGEXP_REPLACE function to locate and extract each quoted value from each string. (See SQL Features Tutorials: Regular Expressions if you are not familiar with regular expression support in Oracle.)
column string format a20 break on data_set skip 1 -- "d" stands for Delimiter variable d varchar2(1) variable data_set varchar2(25) execute :d := ',' ; :data_set := 'CSV' select data_set , num , string , rtrim( regexp_substr( string || :d, '.*?' || :d, 1, 1 ), :d ) as val_1 , rtrim( regexp_substr( string || :d, '.*?' || :d, 1, 2 ), :d ) as val_2 , rtrim( regexp_substr( string || :d, '.*?' || :d, 1, 3 ), :d ) as val_3 from t1 where data_set like :data_set order by data_set, num ;
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3
------------------------- ---- -------------------- ------ ------ ------
CSV 1 a a
2 a, a
3 a,, a
4 ,bb, bb
5 ,,c c c c c c
6 a,bb a bb
7 a,bb, a bb
8 a,bb,c c c a bb c c c
9
10 ,
11 ,,
execute :d := chr(9) ; :data_set := 'Tab Delimited' /
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3
------------------------- ---- -------------------- ------ ------ ------
Tab Delimited 1 a a
2 a· a
3 a·· a
4 ·b,b· b,b
5 ··c c c c c c
6 a·b,b a b,b
7 a·b,b· a b,b
8 a·b,b·c c c a b,b c c c
9
10 ·
11 ··
12
With Pipe Delimited values we need to escape the delimiter in the REGEXP_SUBSTR pattern because "|" has a special meaning in regular expressions.
execute :d := '|' ; :data_set := 'Pipe Delimited' select data_set , num , string , rtrim( regexp_substr( string || :d, '.*?\' || :d, 1, 1 ), :d ) as val_1 , rtrim( regexp_substr( string || :d, '.*?\' || :d, 1, 2 ), :d ) as val_2 , rtrim( regexp_substr( string || :d, '.*?\' || :d, 1, 3 ), :d ) as val_3 from t1 where data_set like :data_set order by data_set, num ;
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3
------------------------- ---- -------------------- ------ ------ ------
Pipe Delimited 1 a a
2 a| a
3 a|| a
4 |b,b| b,b
5 ||c c c c c c
6 a|b,b a b,b
7 a|b,b| a b,b
8 a|b,b|c c c a b,b c c c
9
10 |
11 ||
12
If we wish to ignore empty strings we can use the following technique.
execute :d := ',' ; :data_set := 'CSV' select data_set , num , string , regexp_substr( string, '[^' || :d || ']+', 1, 1 ) as val_1 , regexp_substr( string, '[^' || :d || ']+', 1, 2 ) as val_2 , regexp_substr( string, '[^' || :d || ']+', 1, 3 ) as val_3 from t1 where data_set like :data_set order by data_set, num ;
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3
------------------------- ---- -------------------- ------ ------ ------
CSV 1 a a
2 a, a
3 a,, a
4 ,bb, bb
5 ,,c c c c c c
6 a,bb a bb
7 a,bb, a bb
8 a,bb,c c c a bb c c c
9
10 ,
11 ,,
Note the results for strings 4 and 5 differ from those of prior solutions. This effect may or may not be useful depending on your requirements.
Acknowledgements:
See comments by OTN users cd and ebrian at Re: use of Regular Expression...
