Regular Expressions

Specifying HOW MANY Repetitions

In this tutorial we examine ways of specifying HOW MANY repetitions of a given pattern we want to match.

Other aspects of pattern matching will be kept constant throughout the tutorial, namely:

  • "WHAT" will be a single character
  • "WHERE" will be from the first character in the target onward
  • "WHICH" occurrence will be the first occurrence.

Quantifiers, Greedy, and Non-greedy Matching

To specify how many repetitions of a given character should be matched, Oracle supports quantifiers like '*'. For example, while the 'b' in the regular expression 'abc' matches one and only one repetition of the letter 'b', an expression like 'ab*c' matches zero more repetitions of 'b', as in the targets 'ac', 'abc', or 'abbc'.

In certain situations, a pattern that uses quantifiers could conceivably match two or more substrings within a target. For example, given a target string like 'abcbcbc' the pattern 'a.*c' could match either 'abc', 'abcbc', or 'abcbcbc'. A match to the largest possible result string, 'abcbcbc', is called a "greedy" match. A match to the smallest possible result string, 'abc', is called a "non-greedy" match. POSIX regular expressions only support greedy matching. Fortunately, Oracle supports some PERL influenced extensions to the POSIX specification that support non-greedy matching. To specify a non-greedy match a '?' character is appended to the POSIX quantifier.

The following table summarizes the available quantifiers and indicates whether they are greedy or not.

Quantifier Meaning Greediness
? zero or one occurrence greedy
?? zero or one occurrence non-greedy
* zero or more occurrences greedy
*? zero or more occurrences non-greedy
+ one or more occurrences greedy
+? one or more occurrences non-greedy
{m} exactly m occurrences greedy
{m}? exactly m occurrences non-greedy
{m,n} at least m, but not more than n occurrences greedy
{m,n}? at least m, but not more than n occurrences non-greedy
{m,} m or more occurrences greedy
{m,}? m or more occurrences non-greedy

"There Can Be Only One"

To match one and only one instance of a character, simply use the character itself in the regular expression pattern.

execute set_pattern( 'b' )

execute set_target( 'aaa'  )
execute add_target( 'abc'  )
execute add_target( 'abbc' )
execute add_target( 'acc'  )
execute add_target( 'bbb'  )
execute add_target( 'bbbb' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
b               aaa        N     (null)                 0
                abbc       Y     b                      2
                abc        Y     b                      2
                acc        N     (null)                 0
                bbb        Y     b                      1
                bbbb       Y     b                      1
 

Zero or One

To match zero or one repetitions of a character use the '?' quantifier. For example, to match zero or one instances of the letter 'b' use 'b?'.


execute set_pattern( 'b?' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
b?              aaa        Y     (null)                 1
                abbc       Y     (null)                 1
                abc        Y     (null)                 1
                acc        Y     (null)                 1
                bbb        Y     b                      1
                bbbb       Y     b                      1
 

These results may seem strange at first. For example, one might expect the MATCHED_VALUE for target 'abc' to be 'b', not null. To understand why this is not the case we first have to understand a few underlying principles.

  • "zero" instances of a character is equivalent to a single empty string
  • all target strings are considered to have empty strings within them; e.g. 'abc' is treated like this

    (empty string) a (empty string) b (empty string) c (empty string)
     
  • Oracle displays empty strings as nulls.

Now let's consider the results. The pattern 'b?' is supposed to match zero or one occurrence of 'b', which is equivalent to saying it matches a single empty string or the character 'b'. When Oracle examines the target 'abc' for a match, it scans the equivalent string

(empty string) a (empty string) b (empty string) c (empty string)
 

and finds that the first empty string is a match for the pattern. It therefore reports a successful match at location 1 and returns the empty string as the result (displayed as '(null)').

The same thing happens with targets 'aaa', 'abbc', and 'acc'. Oracle never finds the 'b' at position 2 of 'abbc' because it stops looking when it finds the empty string at the beginning of the target.

For the target 'bbb', Oracle could either match the first empty string in the target or the 'b' at position 1. Since 'b?' is a greedy pattern and since 'b' is a larger result than an empty string, Oracle returns the letter 'b' as the match result.

For another perspective on empty strings, consider the following example where all empty strings in the target 'abc' are replaced with an 'X' to disclose their locations.

select
  'abc' as target ,
  regexp_replace( 'abc', 'd?', 'X' ) as replaced_target
from
  dual
;
 
TARGET     REPLACED_TARGET
---------- ---------------
abc        XaXbXcX
 

In the WHERE to Look tutorial in this section we will see how to use the '^' and '$' characters to match the empty strings at the beginning and end, respectively, of a target.

While trivial patterns such as 'b?' are of little use, non-trivial patterns such as 'a.?c' can be very useful and produce clearer results than 'b?'.

execute set_pattern( 'a.?c' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a.?c            aaa        N     (null)                 0
                abbc       N     (null)                 0
                abc        Y     abc                    1
                acc        Y     acc                    1
                bbb        N     (null)                 0
                bbbb       N     (null)                 0
 

Non-greedy matches of zero or more occurrences are specified with the '??' quantifier.

execute set_pattern( 'a.??c' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a.??c           aaa        N     (null)                 0
                abbc       N     (null)                 0
                abc        Y     abc                    1
                acc        Y     ac                     1
                bbb        N     (null)                 0
                bbbb       N     (null)                 0
 

Zero or More

A zero or more greedy match is specified with the '*' quantifier, its non-greedy counterpart is '*?'.

execute set_pattern( 'a.*c' ) ;

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a.*c            aaa        N     (null)                 0
                abbc       Y     abbc                   1
                abc        Y     abc                    1
                acc        Y     acc                    1
                bbb        N     (null)                 0
                bbbb       N     (null)                 0
 
execute set_pattern( 'a.*?c' ) ;

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
a.*?c           aaa        N     (null)                 0
                abbc       Y     abbc                   1
                abc        Y     abc                    1
                acc        Y     ac                     1
                bbb        N     (null)                 0
                bbbb       N     (null)                 0
 

One or More

A one or more greedy match is specified with the '+' quantifier, its non-greedy counterpart is '+?'.

execute set_pattern( 'b+' ) ;

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
b+              aaa        N     (null)                 0
                abbc       Y     bb                     2
                abc        Y     b                      2
                acc        N     (null)                 0
                bbb        Y     bbb                    1
                bbbb       Y     bbbb                   1
 
execute set_pattern( 'b+?' ) ;

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
b+?             aaa        N     (null)                 0
                abbc       Y     b                      2
                abc        Y     b                      2
                acc        N     (null)                 0
                bbb        Y     b                      1
                bbbb       Y     b                      1
 

Enumerated Occurrences

To specify an exact number or a numeric range of occurrences use the '{}' quantifiers.

execute set_pattern( 'b{2}'    )
execute add_pattern( 'b{2}?'   )
execute add_pattern( 'b{2,3}'  )
execute add_pattern( 'b{2,3}?' )
execute add_pattern( 'b{2,}'   )
execute add_pattern( 'b{2,}?'  )

set pagesize 39

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
b{2}            aaa        N     (null)                 0
                abbc       Y     bb                     2
                abc        N     (null)                 0
                acc        N     (null)                 0
                bbb        Y     bb                     1
                bbbb       Y     bb                     1

b{2}?           aaa        N     (null)                 0
                abbc       Y     bb                     2
                abc        N     (null)                 0
                acc        N     (null)                 0
                bbb        Y     bb                     1
                bbbb       Y     bb                     1

b{2,3}          aaa        N     (null)                 0
                abbc       Y     bb                     2
                abc        N     (null)                 0
                acc        N     (null)                 0
                bbb        Y     bbb                    1
                bbbb       Y     bbb                    1

b{2,3}?         aaa        N     (null)                 0
                abbc       Y     bb                     2
                abc        N     (null)                 0
                acc        N     (null)                 0
                bbb        Y     bb                     1
                bbbb       Y     bb                     1

b{2,}           aaa        N     (null)                 0
                abbc       Y     bb                     2
                abc        N     (null)                 0
                acc        N     (null)                 0
                bbb        Y     bbb                    1
                bbbb       Y     bbbb                   1


PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
b{2,}?          aaa        N     (null)                 0
                abbc       Y     bb                     2
                abc        N     (null)                 0
                acc        N     (null)                 0
                bbb        Y     bb                     1
                bbbb       Y     bb                     1
 

Note that the non-greedy patterns 'b{2}?', 'b{2,3}?', and 'b{2,}?' are of little practical use since they all yield the same result set as 'b{2}'.

Repetitions of Strings with Two or More Characters

In the previous examples we saw how to apply quantifiers to a single character. To apply them to a string of characters simply enclose the string with '(' and ')' and apply the quantifier after the ')'.

execute set_pattern( '(ab)*'   )
execute add_pattern( '(ab)+'   )
execute add_pattern( '(ab){2}' )

execute set_target( 'a'      )
execute add_target( 'ab'     )
execute add_target( 'abab'   )
execute add_target( 'ababab' )

select * from test_results ;
 
PATTERN         TARGET     MATCH MATCHED_VALUE   POSITION
--------------- ---------- ----- --------------- --------
(ab)*           a          Y     (null)                 1
                ab         Y     ab                     1
                abab       Y     abab                   1
                ababab     Y     ababab                 1

(ab)+           a          N     (null)                 0
                ab         Y     ab                     1
                abab       Y     abab                   1
                ababab     Y     ababab                 1

(ab){2}         a          N     (null)                 0
                ab         N     (null)                 0
                abab       Y     abab                   1
                ababab     Y     abab                   1
 



Linking to SQL Snippets ™

To link to this page in Oracle Technology Network Forums or OraFAQ Forums cut and paste this code.

  • [url=http://www.sqlsnippets.com/en/topic-10213.html]SQL Snippets: Regular Expressions - Specifying HOW MANY Repetitions[/url]

To link to this page in HTML documents or Blogger comments cut and paste this code.

  • <a href="http://www.sqlsnippets.com/en/topic-10213.html">SQL Snippets: Regular Expressions - Specifying HOW MANY Repetitions</a>

To link to this page in other web sites use the following values.

  • Link Text : SQL Snippets: Regular Expressions - Specifying HOW MANY Repetitions
  • URL (href): http://www.sqlsnippets.com/en/topic-10213.html