To create a string of comma delimited values from a data set that looks like this
VAL_1 VAL_2 VAL_3 NOTE ------ ------ ------ ------------------------- a One NOT NULL, Two NULL b One NOT NULL, Two NULL c One NOT NULL, Two NULL a b Two NOT NULL, One NULL a b c Three NOT NULL, Zero NULL Zero NOT NULL, Three NULL
simply use the concatenation operator, "||", as in the following snippet.
select val_1, val_2, val_3, val_1 || ',' || val_2 || ',' || val_3 as string from t order by key ;
VAL_1 VAL_2 VAL_3 STRING ------ ------ ------ -------------------- a a,, b ,b, c ,,c a b a,b, a b c a,b,c ,,
To exclude null values from the end result LTRIM and NVL2 can be used.
select val_1, val_2, val_3, LTRIM ( val_1 || NVL2( val_2, ',' || val_2 , null ) || NVL2( val_3, ',' || val_3 , null ) , ',' ) as string from t order by key ;
VAL_1 VAL_2 VAL_3 STRING ------ ------ ------ -------------------- a a b b c c a b a,b a b c a,b,c
How to Create a CSV File
While there is no official standard for CSV file formats, RFC 4180 - Common Format and MIME Type for Comma-Separated Values (CSV) Files describes one commonly used set of rules. We will follow this standard in the examples to follow.
When writing a query whose output will be spooled to a CSV file we need to disable a few SQL*Plus features to ensure they do not corrupt the contents. Here are the settings I typically use.
set ECHO off set FEEDBACK off set LINESIZE 60 set RECSEP off set TRIMSPOOL on set VERIFY off set PAGESIZE 0 spool temp.txt replace select val_1 || ',' || val_2 || ',' || val_3 as string from t order by key ;
a,, ,b, ,,c a,b, a,b,c ,,
spool off
The value for LINESIZE should be larger than the longest expected line in the CSV file.
How to Escape Double Quotes, CR, and LF
Rules for escaping special characters vary from product to product. For products that follow RFC 4180 the following rules affect how we write our query.
-
"4. Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma."
-
"5. Each field may or may not be enclosed in double quotes" - we will enclose all fields in double quotes since it simplifies our query
-
"6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes" - since we will enclose all fields in double quotes we automatically obey this rule
-
"7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."
Before we see the solution let's create some test data with special characters first. The NOTE column will contain values with double quotes, commas, and a line feed.
insert into t values ( 'String 7', 'd', 'e', 'f', 'line 1' || chr(10) || 'line 2' ); insert into t values ( 'String 8', 'g', 'h', 'i', '"1", "2", or "3"' ); commit; set pagesize 60 select val_1, val_2, val_3, note from t order by key ;
VAL_1 VAL_2 VAL_3 NOTE ------ ------ ------ ------------------------- a One NOT NULL, Two NULL b One NOT NULL, Two NULL c One NOT NULL, Two NULL a b Two NOT NULL, One NULL a b c Three NOT NULL, Zero NULL Zero NOT NULL, Three NULL d e f line 1 line 2 g h i "1", "2", or "3"
Now here is the query.
set echo off set feedback off set linesize 60 set recsep off set trimspool on set verify off set pagesize 0 select '"' || replace( val_1, '"', '""' ) || '"' || ',"' || replace( val_2, '"', '""' ) || '"' || ',"' || replace( val_3, '"', '""' ) || '"' || ',"' || replace( note , '"', '""' ) || '"' as string from t order by key ;
"a","","","One NOT NULL, Two NULL" "","b","","One NOT NULL, Two NULL" "","","c","One NOT NULL, Two NULL" "a","b","","Two NOT NULL, One NULL" "a","b","c","Three NOT NULL, Zero NULL" "","","","Zero NOT NULL, Three NULL" "d","e","f","line 1 line 2" "g","h","i","""1"", ""2"", or ""3"""
How to Include a Header Line
RFC 4180 states
"3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines."
Including such a header line is easily accomplished in SQL*Plus for files with less than 50,000 lines (or whatever the maximum setting for PAGESIZE is in your environment) like this.
set echo off set feedback off set linesize 60 set recsep off set trimspool on set verify off set pagesize 50000 set newpage none set heading on set underline off column string heading '"Val 1","Val 2","Val 3","Note Field"' select '"' || replace( val_1, '"', '""' ) || '"' || ',"' || replace( val_2, '"', '""' ) || '"' || ',"' || replace( val_3, '"', '""' ) || '"' || ',"' || replace( note , '"', '""' ) || '"' as string from t order by key ;
"Val 1","Val 2","Val 3","Note Field" "a","","","One NOT NULL, Two NULL" "","b","","One NOT NULL, Two NULL" "","","c","One NOT NULL, Two NULL" "a","b","","Two NOT NULL, One NULL" "a","b","c","Three NOT NULL, Zero NULL" "","","","Zero NOT NULL, Three NULL" "d","e","f","line 1 line 2" "g","h","i","""1"", ""2"", or ""3"""
Unlike column names, note how the column heading can be greater than 30 characters. In the last example the column heading for STRING is 36 characters long.
For files that will contain more than 50,000 lines I use the following approach.
set echo off set feedback off set linesize 60 set recsep off set trimspool on set verify off set pagesize 0 select string from ( select 0 as sort_key_1 , ' ' as sort_key_2 , '"Val 1","Val 2","Val 3","Note Field"' as string from dual union all select 1 as sort_key_1 , key as sort_key_2 , '"' || replace( val_1, '"', '""' ) || '"' || ',"' || replace( val_2, '"', '""' ) || '"' || ',"' || replace( val_3, '"', '""' ) || '"' || ',"' || replace( note , '"', '""' ) || '"' as string from t ) order by sort_key_1 , sort_key_2 ;
"Val 1","Val 2","Val 3","Note Field" "a","","","One NOT NULL, Two NULL" "","b","","One NOT NULL, Two NULL" "","","c","One NOT NULL, Two NULL" "a","b","","Two NOT NULL, One NULL" "a","b","c","Three NOT NULL, Zero NULL" "","","","Zero NOT NULL, Three NULL" "d","e","f","line 1 line 2" "g","h","i","""1"", ""2"", or ""3"""
Gotcha
When creating large CSV files destined for Microsoft Excel beware of limits on the maximum number of rows per worksheet (1M rows in Excel 2007, 64K rows in earlier versions).
Cleanup
delete from t where key in ( 'String 7', 'String 8' ) ; commit ;