This tutorial shows how to collect values from multiple rows into a single, comma delimited string. This is also known as "string aggregation". It takes values like these
GROUP_KEY VAL ---------- ---------- Group 3 a Group 3 b Group 3 c
and yields a string like this.
GROUP_KEY STRING ---------- --------------- Group 3 a,b,c
While the solution to follow, which we will refer to as MODEL Method 1, is the simplest of the three MODEL solution presented in this section it unfortunately scales the worst when a large number of values is selected or very wide STRING values are produced (see MODEL Clause: MODEL Performance Tuning). MODEL Method 2 and MODEL Method 3 provide alternative MODEL techniques that scale better than MODEL Method 1, though they are more complex.
Test Data
desc t Name Null? Type ----------------------------- -------- -------------------- GROUP_KEY VARCHAR2(10) VAL VARCHAR2(10) set null "(null)" break on group_key skip 1 duplicates select * from t order by group_key, val ;
GROUP_KEY VAL ---------- ---------- Group 1 a Group 2 a Group 2 b Group 3 a Group 3 b Group 3 c Group 4 a Group 4 a Group 4 b Group 4 b Group 5 a Group 5 b Group 5 d Group 5 e Group 5 (null) Group 6 (null) Group 7 (null) Group 7 (null) Group 8 a Group 8 a Group 8 b Group 8 b Group 8 (null) Group 8 (null) (null) (null)
Solution
This solution uses the SQL MODEL clause which was introduced in Oracle 10g. If you are unfamiliar with MODEL you may wish to review the SQL Features Tutorials: MODEL Clause tutorial before proceeding.
clear breaks
select group_key, substr( string, 2 ) as string
from t
model
return updated rows
partition by ( group_key )
dimension by ( row_number() over (partition by group_key order by val) as position )
measures ( cast( val as varchar2(65) ) as string ) -- Note 1
rules
upsert
iterate( 6 ) -- Note 2
until ( presentv(string[iteration_number+2],1,0) = 0 )
( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
GROUP_KEY STRING ---------- --------------- Group 1 a Group 2 a,b Group 3 a,b,c Group 4 a,a,b,b Group 5 a,b,d,e, Group 6 (null) Group 7 , Group 8 a,a,b,b,, (null) (null)
Notes:
- 65 = v * n + (n-1) where
- v = maximum length of VAL values (this is 10 in our test table)
- n = maximum number of rows with the same GROUP_KEY (6 in our test table)
- Use any number here greater than or equal to n.
To control the order of the delimited values, change the ORDER BY clause in the arguments for ROW_NUMBER().
select group_key, substr( string, 2 ) as string
from t
model
return updated rows
partition by ( group_key )
dimension by ( row_number() over (partition by group_key ORDER BY VAL DESC) as position )
measures ( cast( val as varchar2(65) ) as string )
rules
upsert
iterate( 6 )
until ( presentv(string[iteration_number+2],1,0) = 0 )
( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
GROUP_KEY STRING ---------- --------------- Group 1 a Group 2 b,a Group 3 c,b,a Group 4 b,b,a,a Group 5 ,e,d,b,a Group 6 (null) Group 7 , Group 8 ,,b,b,a,a (null) (null)
To filter duplicate VAL values, like those in Groups 4, 7, and 8 from the end result we use an inline view in the FROM clause.
select group_key, substr( string, 2 ) as string
from ( SELECT DISTINCT GROUP_KEY, VAL FROM T ) T
model
return updated rows
partition by ( group_key )
dimension by ( row_number() over (partition by group_key order by val) as position )
measures ( cast( val as varchar2(65) ) as string )
rules
upsert
iterate( 6 )
until ( presentv(string[iteration_number+2],1,0) = 0 )
( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
GROUP_KEY STRING ---------- --------------- Group 1 a Group 2 a,b Group 3 a,b,c Group 4 a,b Group 5 a,b,d,e, Group 6 (null) Group 7 (null) Group 8 a,b, (null) (null)
To filter null VAL values, like the ones in Groups 5 and higher from the end result we can add a WHERE clause to the query.
select group_key, substr( string, 2 ) as string
from t
WHERE VAL IS NOT NULL
model
return updated rows
partition by ( group_key )
dimension by ( row_number() over (partition by group_key order by val) as position )
measures ( cast( val as varchar2(65) ) as string ) -- Note 1
rules
upsert
iterate( 6 ) -- Note 2
until ( presentv(string[iteration_number+2],1,0) = 0 )
( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
GROUP_KEY STRING ---------- --------------- Group 1 a Group 2 a,b Group 3 a,b,c Group 4 a,a,b,b Group 5 a,b,d,e Group 8 a,a,b,b
In the next tutorial, MODEL Method 1 - Explained, we look at how this technique works.
Gotchas
Distinct
If we attempt to produce distinct values in our strings using SELECT DISTINCT instead of an inline view we will be rewarded with incorrect results. This is because SELECT DISTINCT is evaluated after the MODEL clause is applied, not before.
select DISTINCT group_key, substr( string, 2 ) as string
from t
model
return updated rows
partition by ( group_key )
dimension by ( row_number() over (partition by group_key order by val) as position )
measures ( cast( val as varchar2(65) ) as string )
rules
upsert
iterate( 6 )
until ( presentv(string[iteration_number+2],1,0) = 0 )
( string[0] = string[0] || ',' || string[iteration_number+1] )
order by group_key ;
GROUP_KEY STRING ---------- --------------- Group 1 a Group 2 a,b Group 3 a,b,c Group 4 a,a,b,b Group 5 a,b,d,e, Group 6 (null) Group 7 , Group 8 a,a,b,b,, (null) (null)
IS PRESENT
When I originally wrote this solution I tried to use the following UNTIL clause
iterate( 6 ) until ( NOT( STRING[ITERATION_NUMBER+2] IS PRESENT ) = 0 )
Unfortunately it produced some rather nasty errors.
ERROR at line 1: ORA-03113: end-of-file on communication channel ERROR: ORA-03114: not connected to ORACLE ksedmp: internal or fatal error ORA-07445: exception encountered: core dump [ACCESS_VIOLATION] [_qcss_process_expr+226] [PC:0x1E57EC6] [ADDR:0x18] [UNABLE_TO_READ] [] Current SQL statement for this session: select group_key, string from t ... ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- _qcss_process_expr+ 00000000 226 _qcss_process_cond+ CALLrel _qcss_process_expr+ 8CA6754 8171468 841BB64 76 0 8CA6698 _qcss_process_sprea CALLrel _qcss_process_cond+ 8CA6754 8171468 841BB58 dsheet+102 0 8CA6698 _qcssSemanticCbks+4 CALLrel _qcss_process_sprea 8CA6754 8171468 841A5E0 5 dsheet+0 ...
As a workaround I used until( presentv(string[iteration_number+2],1,0) = 0 )
instead of the NOT ... IS PRESENT logic.
