This tutorial shows how to extract values from strings of XML elements like this one
DATA_SET STRING ------------------------- ---------------------------------------- XML - 1 <V>a</V> <V>b,b</V> <V>c c c</V>
and return the values in separate columns like this.
DATA_SET VAL_1 VAL_2 VAL_3 ------------------------- ------ ------ ------ XML - 1 a b,b c c c
Warning: The snippets in this tutorial do not work as expected when the source data contains NULL values (as tested in Oracle 10g XE, see Gotchas below). NULL values have therefore been excluded from the test data.
There are a number of XML related features we can use in a solution. However, before using these features we must first convert our test strings into XMLTYPE data types.
Converting VARCHAR2 to XMLTYPE
Here is an example of how to use the XMLPARSE function to convert a VARCHAR2 value containing an XML fragment into an XMLTYPE datatype.
set recsep each select num, string, XMLPARSE( CONTENT string ) as xml_instance from t1_xml_1 order by num ;
NUM STRING XML_INSTANCE ---- ---------------------------------------- ---------------------------------------- 1 <V>a</V> <V>a</V> 2 <V>a</V> <V></V> <V>a</V> <V></V> 3 <V>a</V> <V></V> <V></V> <V>a</V> <V></V> <V></V> 4 <V></V> <V>b,b</V> <V></V> <V></V> <V>b,b</V> <V></V> 5 <V></V> <V></V> <V>c c c</V> <V></V> <V></V> <V>c c c</V> 6 <V>a</V> <V>b,b</V> <V>a</V> <V>b,b</V> 7 <V>a</V> <V>b,b</V> <V></V> <V>a</V> <V>b,b</V> <V></V> 8 <V>a</V> <V>b,b</V> <V>c c c</V> <V>a</V> <V>b,b</V> <V>c c c</V> 9 <V></V> <V></V> 10 <V></V> <V></V> <V></V> <V></V> 11 <V></V> <V></V> <V></V> <V></V> <V></V> <V></V>
If we can guarantee that our strings always resolve to well-formed XML documents then we can use the WELLFORMED keyword to spare the database from performing redundant validity checks.
select num, xmlparse( content string WELLFORMED ) as xml_instance from t1_xml_1 order by num ;
NUM XML_INSTANCE ---- ---------------------------------------- 1 <V>a</V> 2 <V>a</V> <V></V> 3 <V>a</V> <V></V> <V></V> 4 <V></V> <V>b,b</V> <V></V> 5 <V></V> <V></V> <V>c c c</V> 6 <V>a</V> <V>b,b</V> 7 <V>a</V> <V>b,b</V> <V></V> 8 <V>a</V> <V>b,b</V> <V>c c c</V> 9 <V></V> 10 <V></V> <V></V> 11 <V></V> <V></V> <V></V>
In PL/SQL the conversion is slightly more complex since the XMLPARSE function is not available (as tested in Oracle 10g XE). Attempting to use XMLPARSE will generate a PLS-00103 error.
declare v varchar2(50) := '<V>a</V> <V>b,b</V>' ; x xmltype ; begin x := xmlparse( content v wellformed ) ; end; /
ERROR at line 5: ORA-06550: line 5, column 26: PLS-00103: Encountered the symbol "V" when expecting one of the following: ...
One PL/SQL alternative is to use a SELECT statement to perform the conversion.
declare v varchar2(50) := '<V>a</V> <V>b</V>' ; x xmltype ; begin select xmlparse( content v wellformed ) into x from dual; end; / PL/SQL procedure successfully completed.
Another is to convert the XML string fragment into an XML document by giving it a root node. You can then use the XMLTYPE constructor to do the VARCHAR2 to XMLTYPE data type conversion.
declare v varchar2(50) := '<V>a</V> <V>b,b</V>' ; x xmltype ; begin x := XMLTYPE('<ROOT>' || v || '</ROOT>'); -- note: we can use any valid name for the root node here, -- it does not have to be "ROOT" end; / PL/SQL procedure successfully completed.
Extracting Values with the EXTRACT and GETSTRINGVAL Member Functions
Once we have an XMLTYPE instance we can extract values using its EXTRACT and GETSTRINGVAL Member Functions as follows.
select t1_xml_1a.num , t1_xml_1a.xml_instance , t1_xml_1a.xml_instance.EXTRACT('/V[1]/text()').GETSTRINGVAL() as val_1 , t1_xml_1a.xml_instance.EXTRACT('/V[2]/text()').GETSTRINGVAL() as val_2 , t1_xml_1a.xml_instance.EXTRACT('/V[3]/text()').GETSTRINGVAL() as val_3 from ( select num, xmlparse( content string wellformed ) as xml_instance from t1_xml_1 ) t1_xml_1a order by num ;
NUM XML_INSTANCE VAL_1 VAL_2 VAL_3 ---- ---------------------------------------- ------ ------ ------ 1 <V>a</V> a 2 <V>a</V> <V></V> a 3 <V>a</V> <V></V> <V></V> a 4 <V></V> <V>b,b</V> <V></V> b,b 5 <V></V> <V></V> <V>c c c</V> c c c 6 <V>a</V> <V>b,b</V> a b,b 7 <V>a</V> <V>b,b</V> <V></V> a b,b 8 <V>a</V> <V>b,b</V> <V>c c c</V> a b,b c c c 9 <V></V> 10 <V></V> <V></V> 11 <V></V> <V></V> <V></V>
If our data contains missing elements (like rows 1, 2, 6, 9, and 10) and we try to use the same approach in PL/SQL we will encounter an ORA-30625 error.
begin dbms_output.put_line( ' num VAL_1 VAL_2 VAL_3 ---------- ---------- ---------- ---------- ' ); for t1_xml_1_row in ( select num, xmlparse( content t1_xml_1.string wellformed ) as xml_instance from t1_xml_1 order by num ) loop declare val_1 varchar2(10) := t1_xml_1_row.xml_instance.EXTRACT('/V[1]/text()').GETSTRINGVAL() ; val_2 varchar2(10) := t1_xml_1_row.xml_instance.EXTRACT('/V[2]/text()').GETSTRINGVAL() ; val_3 varchar2(10) := t1_xml_1_row.xml_instance.EXTRACT('/V[3]/text()').GETSTRINGVAL() ; begin dbms_output.put_line ( rpad( t1_xml_1_row.num, 11 ) || rpad( nvl( val_1, ' ' ), 11 ) || rpad( nvl( val_2, ' ' ), 11 ) || rpad( nvl( val_3, ' ' ), 11 ) ); end; end loop; end; / num VAL_1 VAL_2 VAL_3 ---------- ---------- ---------- ---------- begin * ERROR at line 1: ORA-30625: method dispatch on NULL SELF argument is disallowed ORA-06512: at line 17
To work around the error we perform the extraction in the SELECT statement instead of the DBMS_OUTPUT command.
begin dbms_output.put_line( ' num VAL_1 VAL_2 VAL_3 ---------- ---------- ---------- ---------- ' ); for t1_xml_1_row in ( select t1_xml_1a.num , t1_xml_1a.xml_instance , t1_xml_1a.xml_instance.EXTRACT('/V[1]/text()').GETSTRINGVAL() as val_1 , t1_xml_1a.xml_instance.EXTRACT('/V[2]/text()').GETSTRINGVAL() as val_2 , t1_xml_1a.xml_instance.EXTRACT('/V[3]/text()').GETSTRINGVAL() as val_3 from ( select t1_xml_1.num, xmlparse(content t1_xml_1.string wellformed) as xml_instance from t1_xml_1 ) t1_xml_1a order by num ) loop dbms_output.put_line ( rpad( t1_xml_1_row.num, 11 ) || rpad( nvl( t1_xml_1_row.val_1, ' ' ), 11 ) || rpad( nvl( t1_xml_1_row.val_2, ' ' ), 11 ) || rpad( nvl( t1_xml_1_row.val_3, ' ' ), 11 ) ); end loop; end; / num VAL_1 VAL_2 VAL_3 ---------- ---------- ---------- ---------- 1 a 2 a 3 a 4 b,b 5 c c c 6 a b,b 7 a b,b 8 a b,b c c c 9 10 11
We can also work around the error by using the CASE operator to avoid applying getStringVal() to nonexistent elements.
begin dbms_output.put_line( ' num VAL_1 VAL_2 VAL_3 ---------- ---------- ---------- ---------- ' ); for t1_xml_1_row in ( select num, xmlparse( content string wellformed ) as xml_instance from t1_xml_1 order by num ) loop declare x_val_1 xmltype := t1_xml_1_row.xml_instance.extract('/V[1]/text()') ; x_val_2 xmltype := t1_xml_1_row.xml_instance.extract('/V[2]/text()') ; x_val_3 xmltype := t1_xml_1_row.xml_instance.extract('/V[3]/text()') ; val_1 varchar2(10) := CASE when x_val_1 is null then null else x_val_1.getStringVal() end ; val_2 varchar2(10) := CASE when x_val_2 is null then null else x_val_2.getStringVal() end ; val_3 varchar2(10) := CASE when x_val_3 is null then null else x_val_3.getStringVal() end ; begin dbms_output.put_line ( rpad( t1_xml_1_row.num, 11 ) || rpad( nvl( val_1, ' ' ), 11 ) || rpad( nvl( val_2, ' ' ), 11 ) || rpad( nvl( val_3, ' ' ), 11 ) ); end; end loop; end; / num VAL_1 VAL_2 VAL_3 ---------- ---------- ---------- ---------- 1 a 2 a 3 a 4 b,b 5 c c c 6 a b,b 7 a b,b 8 a b,b c c c 9 10 11
Extracting Numbers With the EXTRACT and GETNUMBERVAL Member Functions
If our data contained only numeric values we could use the "getNumberVal()" member function instead of "getStringVal()".
column n_val_1 format 999999 column n_val_2 format 999999 column n_val_3 format 999999 select t1_xml_3.num , t1_xml_3.xml_instance , t1_xml_3.xml_instance.extract('/V[1]/text()').GETNUMBERVAL() as N_VAL_1 , t1_xml_3.xml_instance.extract('/V[2]/text()').GETNUMBERVAL() as N_VAL_2 , t1_xml_3.xml_instance.extract('/V[3]/text()').GETNUMBERVAL() as N_VAL_3 from ( select 1 as num, xmlparse(content '<V>5</V> <V></V> <V>46</V>' wellformed) as xml_instance from dual ) t1_xml_3 ;
NUM XML_INSTANCE N_VAL_1 N_VAL_2 N_VAL_3 ---- ---------------------------------------- ------- ------- ------- 1 <V>5</V> <V></V> <V>46</V> 5 46
Extracting Values With the SQL EXTRACTVALUE Function
Instead of using the "extract()" and "getStringVal()" member functions together we can use the SQL convenience function EXTRACTVALUE to perform the same logic.
select t1_xml_1a.num , t1_xml_1a.xml_instance , EXTRACTVALUE( t1_xml_1a.xml_instance, '/V[1]/text()' ) as VAL_1 , EXTRACTVALUE( t1_xml_1a.xml_instance, '/V[2]/text()' ) as VAL_2 , EXTRACTVALUE( t1_xml_1a.xml_instance, '/V[3]/text()' ) as VAL_3 from ( select num, xmlparse( content string wellformed ) as xml_instance from t1_xml_1 ) t1_xml_1a order by num ;
NUM XML_INSTANCE VAL_1 VAL_2 VAL_3 ---- ---------------------------------------- ------ ------ ------ 1 <V>a</V> a 2 <V>a</V> <V></V> a 3 <V>a</V> <V></V> <V></V> a 4 <V></V> <V>b,b</V> <V></V> b,b 5 <V></V> <V></V> <V>c c c</V> c c c 6 <V>a</V> <V>b,b</V> a b,b 7 <V>a</V> <V>b,b</V> <V></V> a b,b 8 <V>a</V> <V>b,b</V> <V>c c c</V> a b,b c c c 9 <V></V> 10 <V></V> <V></V> 11 <V></V> <V></V> <V></V>
Note that EXTRACTVALUE does not appear to work in PL/SQL (as tested in Oracle 10g XE). When we attempt to use EXTRACTVALUE in PL/SQL we get a PLS-00201 error.
declare v varchar2(50) := '<V>a</V> <V>b</V>' ; val_1 varchar2(10) ; begin val_1 := extractvalue( v, '/V[1]/text()' ) ; end; / val_1 := extractvalue( v, '/V[1]/text()' ) ; * ERROR at line 5: ORA-06550: line 5, column 12: PLS-00201: identifier 'EXTRACTVALUE' must be declared ORA-06550: line 5, column 3: PL/SQL: Statement ignored
Gotchas
Column Prefixes
In SQL commands we need to prefix references to some xml instances with the table alias to avoid an ORA-00904 error. For example, this will not work
select xml_instance , xml_instance.extract( '/V[1]/text()' ) as xml_instance from ( select num, xmlparse( content string wellformed ) as xml_instance from t1_xml_1 ) t1_xml_1a where num = 'String 01' order by num ; xml_instance.extract( '/V[1]/text()' ) as xml_instance * ERROR at line 3: ORA-00904: "XML_INSTANCE"."EXTRACT": invalid identifier
but this will.
select xml_instance , -- prefix not required T1_XML_1A.xml_instance.extract( '/V[1]/text()' ) as val_1 -- prefix required from ( select num, xmlparse( content string wellformed ) as xml_instance from t1_xml_1 ) t1_xml_1a where num = 1 ;
XML_INSTANCE VAL_1 ---------------------------------------- ------ <V>a</V> a
Case Sensitivity
The XPath string argument in functions like EXTRACT and EXTRACTVALUE is case sensitive. For example, this snippet will return a null value for the XML_ELEMENT column because "v[1]" is not the same as "V[1]".
set null "(null)" select xmlparse( content t1_xml_1.string ).extract( '/v[1]/text()' ) as xml_instance from t1_xml_1 where num = 1 ;
XML_INSTANCE ---------------------------------------- (null)
set null ""
This snippet generates an error because "Text()" is not a valid XPath keyword ("TEXT()" would not be either).
select xmlparse( content t1_xml_1.string ).extract( '/V[1]/Text()' ) as xml_instance from t1_xml_1 where num = 1 ; ERROR: ORA-31011: XML parsing failed ORA-19202: Error occurred in XML processing LPX-00601: Invalid token in: '/V[1]/Text()' ORA-06512: at "SYS.XMLTYPE", line 111
This version works.
select xmlparse( content t1_xml_1.string ).extract( '/V[1]/text()' ) as xml_instance from t1_xml_1 where num = 1 ;
XML_INSTANCE ---------------------------------------- a
NULL Values
As explained in the warning message at the top of the page we have deliberately excluded test data containing NULL values because NULL values produce unexpected results due to what I believe is a bug. Here are some examples that demonstrate the unexpected results.
set null "(null)" insert into t1_xml_1 values ( 'XML - 1', 12 , null, 'NULL string' ); commit; select num, string, xmlparse( content string ) xml_instance from t1_xml_1 where num in ( 1, 12 ) order by num ;
NUM STRING XML_INSTANCE ---- ---------------------------------------- ---------------------------------------- 1 <V>a</V> <V>a</V> 12 (null) <V>a</V>
select num, string, xmlparse( content string ) xml_instance from t1_xml_1 where num in ( 8, 12 ) order by num ;
NUM STRING XML_INSTANCE ---- ---------------------------------------- ---------------------------------------- 8 <V>a</V> <V>b,b</V> <V>c c c</V> <V>a</V> <V>b,b</V> <V>c c c</V> 12 (null) <V>a</V> <V>b,b</V> <V>c c c</V>
select num, string, xmlparse( content string ) xml_instance from t1_xml_1 where num in ( 11, 12 ) order by num ;
NUM STRING XML_INSTANCE ---- ---------------------------------------- ---------------------------------------- 11 <V></V> <V></V> <V></V> <V></V> <V></V> <V></V> 12 (null) <V></V> <V></V> <V></V>
delete from t1_xml_1 where num = 12 ; commit;
The values under the XML_ELEMENT column for rows where NUM=12 should be null in each of the three
examples above, but instead the result varies depending on which row appears immediately before row 12.
My post about this on the OTN XML DB discussion forum,
XMLPARSE and NULL values
, has yielded no replies to date (2007-03-29).
If anyone can shed some light on the subject please email me at .