Regular Expressions

Specifying WHAT to Match - Part 1

In this tutorial we begin examining ways to specify WHAT text to match by exploring patterns for individual characters.

Matching Individual Characters

To match a single character string, e.g. the letter 'b', simply use it as the regular expression pattern.

execute set_pattern( 'b' )

execute set_target( 'aaa'  )
execute add_target( 'abc'  )
execute add_target( 'bbb'  )
execute add_target( 'bcd'  )
execute add_target( 'cde'  )
execute add_target( 'def'  )
execute add_target( 'efg'  )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
b               aaa        N     (null)                 0
                abc        Y     b                      2
                bbb        Y     b                      1
                bcd        Y     b                      1
                cde        N     (null)                 0
                def        N     (null)                 0
                efg        N     (null)                 0
 

To match any one of a set of individual characters use character list syntax '[ ]'. The following example matches either an 'a', 'c', or 'g'.

execute set_pattern( '[acg]' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
[acg]           aaa        Y     a                      1
                abc        Y     a                      1
                bbb        N     (null)                 0
                bcd        Y     c                      2
                cde        Y     c                      1
                def        N     (null)                 0
                efg        Y     g                      3
 

To exclude characters place '^' after the '['.

execute set_pattern( '[^acg]' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
[^acg]          aaa        N     (null)                 0
                abc        Y     b                      2
                bbb        Y     b                      1
                bcd        Y     b                      1
                cde        Y     d                      2
                def        Y     d                      1
                efg        Y     e                      1
 

Matching any one of a range of characters, e.g. 'a' through 'c', can be done with the range operator '-'.

execute set_pattern( '[a-c]' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
[a-c]           aaa        Y     a                      1
                abc        Y     a                      1
                bbb        Y     b                      1
                bcd        Y     b                      1
                cde        Y     c                      1
                def        N     (null)                 0
                efg        N     (null)                 0
 

Be careful when working with alphabets other than English since a range like '[a-z]' may not include all characters in your alphabet.

execute set_pattern( '[a-z]' )

execute set_target( 'aeiou'  )
execute add_target( 'àéîõü' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
[a-z]           aeiou      Y     a                      1
                àéîõü      N     (null)                 0
 

POSIX character classes and equivalence classes (discussed in WHAT to Match - 2) offer a solution when working with alphabets other than English.

Case Insensitive Matching

By default, pattern matching with regular expressions is case sensitive. To perform case insensitive matches use the pattern matching modifier 'i' in REGEXP conditions and functions.

First, consider a case sensitive match.

execute set_target( 'a' )
execute add_target( 'b' )
execute add_target( 'c' )
execute add_target( 'A' )
execute add_target( 'B' )
execute add_target( 'C' )

select
  '[b]' as pattern,
  target ,
  regexp_substr( target, '[b]' ) as matched_value
from
  targets
;
 
PATTERN         TARGET     MATCHED_VALUE
--------------- ---------- ---------------
[b]             a          (null)
                b          b
                c          (null)
                A          (null)
                B          (null)
                C          (null)
 

Now consider the same pattern in a case insensitive match. The 'i' modifier is specified in the fifth parameter of the REGEXP_SUBSTR function.

select
  '[b]' as pattern,
  target ,
  regexp_substr( target, '[b]', 1, 1, 'i' ) as matched_value
from
  targets
;
 
PATTERN         TARGET     MATCHED_VALUE
--------------- ---------- ---------------
[b]             a          (null)
                b          b
                c          (null)
                A          (null)
                B          B
                C          (null)
 

Escaping Special Characters

Attempting to match special characters, like '[', may cause problems if you fail to escape them. The pattern 'a[b', for example, will not work.

execute set_pattern( 'a[b' )

execute set_target( 'abc'   )
execute add_target( 'a[b[c' )
execute add_target( 'a\b\c' )

select * from test_results ;
select * from test_results
                          *
ERROR at line 1:
ORA-12726: unmatched bracket in regular expression


 

To escape special characters, prefix them with the '\' character.

execute set_pattern( 'a\[b' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a\[b            a[b[c      Y     a[b                    1
                a\b\c      N     (null)                 0
                abc        N     (null)                 0
 
execute set_pattern( 'a\\b' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a\\b            a[b[c      N     (null)                 0
                a\b\c      Y     a\b                    1
                abc        N     (null)                 0
 

Wildcard Character

To match any character at all, except null and the newline character, use the '.' character.

execute set_pattern( '.' );

execute set_target( 'abc' )
execute add_target( '#24' )
execute add_target( '123' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
.               #24        Y     #                      1
                123        Y     1                      1
                abc        Y     a                      1
 

Be careful with nulls, nuls, line feeds, and carriage returns. '.' does not match them all.

execute set_pattern( '.' );

begin set_target( chr(0) );  end; -- NUL character, do not confuse with NULL
/
begin add_target( chr(10) ); end; -- a Linefeed
/
begin add_target( chr(13) ); end; -- a Carriage Return
/
begin add_target( 'a' );     end;
/
begin add_target( null );    end;
/
begin add_target( '' );      end; -- an empty string is equivalent to NULL
/

select
  pattern ,
  ascii( target ) as target_ascii_code ,
  match ,
  ascii( matched_value ) as matched_value_ascii_code ,
  position
from
  test_results
;
 
PATTERN         TARGET_ASCII_CODE MATCH MATCHED_VALUE_ASCII_CODE POSITION
--------------- ----------------- ----- ------------------------ --------
.                               0 Y                            0        1
                               10 N     (null)                          0
                               13 Y                           13        1
                               97 Y                           97        1
                (null)            N     (null)                   (null)
                (null)            N     (null)                   (null)
 

If you need '.' to match the newline character you can use the 'n' modifier in REGEXP conditions and functions.

select
  '.' as pattern ,
  ascii( target ) as target_ascii_code ,
  case
    when regexp_like( target, '.', 'n' ) then 'Y'
    else 'N'
    end as match ,
  regexp_instr( target, '.', 1, 1, 0, 'n' ) as position
from
  targets
;
 
PATTERN         TARGET_ASCII_CODE MATCH POSITION
--------------- ----------------- ----- --------
.                               0 Y            1
                               10 Y            1
                               13 Y            1
                               97 Y            1
                (null)            N     (null)
                (null)            N     (null)
 

Nulls

Using a null for either a regular expression pattern or a search target always yields an unknown result.

execute set_pattern( null )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
(null)                     N     (null)          (null)
                           N     (null)          (null)
                           N     (null)          (null)
                a          N     (null)          (null)
                (null)     N     (null)          (null)
                (null)     N     (null)          (null)
 
execute set_pattern( '.' )

execute set_target( 'abc' )
execute add_target( null  )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
.               abc        Y     a                      1
                (null)     N     (null)          (null)
 

Gotchas

You cannot specify multi-character strings in character lists. A pattern like 'a[(bc)]' does NOT mean 'a' followed by the string 'bc'. It has the same meaning as 'a[bc]'.

execute set_pattern( 'a[(bc)]' )

execute set_target( 'abc' )
execute add_target( 'ab'  )
execute add_target( 'ac'  )
execute add_target( 'bc'  )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a[(bc)]         ab         Y     ab                     1
                abc        Y     ab                     1
                ac         Y     ac                     1
                bc         N     (null)                 0
 
execute set_pattern( 'a[bc]' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a[bc]           ab         Y     ab                     1
                abc        Y     ab                     1
                ac         Y     ac                     1
                bc         N     (null)                 0
 

You cannot use 'not' to turn an expression with an unknown result into an expression with a true result. For example, neither of the the following two queries return any rows, though some might expect the second one would.

set feedback on

select *
from   dual
where  regexp_like( null, 'abc' ) ;

no rows selected


select *
from   dual
where  not regexp_like( null, 'abc' ) ;

no rows selected

 



Linking to SQL Snippets ™

To link to this page in Oracle Technology Network Forums or OraFAQ Forums cut and paste this code.

  • [url=http://www.sqlsnippets.com/en/topic-10764.html]SQL Snippets: Regular Expressions - Specifying WHAT to Match - Part 1[/url]

To link to this page in HTML documents or Blogger comments cut and paste this code.

  • <a href="http://www.sqlsnippets.com/en/topic-10764.html">SQL Snippets: Regular Expressions - Specifying WHAT to Match - Part 1</a>

To link to this page in other web sites use the following values.

  • Link Text : SQL Snippets: Regular Expressions - Specifying WHAT to Match - Part 1
  • URL (href): http://www.sqlsnippets.com/en/topic-10764.html