This tutorial shows how to extract comma separated values (CSV), pipe delimited values,
or tab delimited values from strings like these
(on this page the
·
symbol indicates a single TAB character, i.e. ASCII code 9)
DATA_SET STRING ------------------------- -------------------- CSV a,bb,c c c Pipe Delimited a|b,b|c c c Tab Delimited a·b,b·c c c
and return the values in separate columns like this.
DATA_SET VAL_1 VAL_2 VAL_3 ------------------------- ------ ------ ------ CSV a bb c c c Pipe Delimited a b,b c c c Tab Delimited a b,b c c c
The solution in this tutorial uses features introduced in Oracle 10g. For Oracle databases running versions earlier than 10g see the Separated Values tutorial.
The solution for Oracle 10g+ uses the REGEXP_REPLACE function to locate and extract each quoted value from each string. (See SQL Features Tutorials: Regular Expressions if you are not familiar with regular expression support in Oracle.)
column string format a20 break on data_set skip 1 -- "d" stands for Delimiter variable d varchar2(1) variable data_set varchar2(25) execute :d := ',' ; :data_set := 'CSV' select data_set , num , string , rtrim( regexp_substr( string || :d, '.*?' || :d, 1, 1 ), :d ) as val_1 , rtrim( regexp_substr( string || :d, '.*?' || :d, 1, 2 ), :d ) as val_2 , rtrim( regexp_substr( string || :d, '.*?' || :d, 1, 3 ), :d ) as val_3 from t1 where data_set like :data_set order by data_set, num ;
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3 ------------------------- ---- -------------------- ------ ------ ------ CSV 1 a a 2 a, a 3 a,, a 4 ,bb, bb 5 ,,c c c c c c 6 a,bb a bb 7 a,bb, a bb 8 a,bb,c c c a bb c c c 9 10 , 11 ,,
execute :d := chr(9) ; :data_set := 'Tab Delimited' /
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3 ------------------------- ---- -------------------- ------ ------ ------ Tab Delimited 1 a a 2 a· a 3 a·· a 4 ·b,b· b,b 5 ··c c c c c c 6 a·b,b a b,b 7 a·b,b· a b,b 8 a·b,b·c c c a b,b c c c 9 10 · 11 ·· 12
With Pipe Delimited values we need to escape the delimiter in the REGEXP_SUBSTR pattern because "|" has a special meaning in regular expressions.
execute :d := '|' ; :data_set := 'Pipe Delimited' select data_set , num , string , rtrim( regexp_substr( string || :d, '.*?\' || :d, 1, 1 ), :d ) as val_1 , rtrim( regexp_substr( string || :d, '.*?\' || :d, 1, 2 ), :d ) as val_2 , rtrim( regexp_substr( string || :d, '.*?\' || :d, 1, 3 ), :d ) as val_3 from t1 where data_set like :data_set order by data_set, num ;
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3 ------------------------- ---- -------------------- ------ ------ ------ Pipe Delimited 1 a a 2 a| a 3 a|| a 4 |b,b| b,b 5 ||c c c c c c 6 a|b,b a b,b 7 a|b,b| a b,b 8 a|b,b|c c c a b,b c c c 9 10 | 11 || 12
If we wish to ignore empty strings we can use the following technique.
execute :d := ',' ; :data_set := 'CSV' select data_set , num , string , regexp_substr( string, '[^' || :d || ']+', 1, 1 ) as val_1 , regexp_substr( string, '[^' || :d || ']+', 1, 2 ) as val_2 , regexp_substr( string, '[^' || :d || ']+', 1, 3 ) as val_3 from t1 where data_set like :data_set order by data_set, num ;
DATA_SET NUM STRING VAL_1 VAL_2 VAL_3 ------------------------- ---- -------------------- ------ ------ ------ CSV 1 a a 2 a, a 3 a,, a 4 ,bb, bb 5 ,,c c c c c c 6 a,bb a bb 7 a,bb, a bb 8 a,bb,c c c a bb c c c 9 10 , 11 ,,
Note the results for strings 4 and 5 differ from those of prior solutions. This effect may or may not be useful depending on your requirements.
Acknowledgements:
See comments by OTN users cd and ebrian at Re: use of Regular Expression...