In this tutorial we examine ways of specifying HOW MANY repetitions of a given pattern we want to match.
Other aspects of pattern matching will be kept constant throughout the tutorial, namely:
- "WHAT" will be a single character
- "WHERE" will be from the first character in the target onward
- "WHICH" occurrence will be the first occurrence.
Quantifiers, Greedy, and Non-greedy Matching
To specify how many repetitions of a given character should be matched, Oracle supports quantifiers like '*'. For example, while the 'b' in the regular expression 'abc' matches one and only one repetition of the letter 'b', an expression like 'ab*c' matches zero more repetitions of 'b', as in the targets 'ac', 'abc', or 'abbc'.
In certain situations, a pattern that uses quantifiers could conceivably match two or more substrings within a target. For example, given a target string like 'abcbcbc' the pattern 'a.*c' could match either 'abc', 'abcbc', or 'abcbcbc'. A match to the largest possible result string, 'abcbcbc', is called a "greedy" match. A match to the smallest possible result string, 'abc', is called a "non-greedy" match. POSIX regular expressions only support greedy matching. Fortunately, Oracle supports some PERL influenced extensions to the POSIX specification that support non-greedy matching. To specify a non-greedy match a '?' character is appended to the POSIX quantifier.
The following table summarizes the available quantifiers and indicates whether they are greedy or not.
Quantifier | Meaning | Greediness |
---|---|---|
? | zero or one occurrence | greedy |
?? | zero or one occurrence | non-greedy |
* | zero or more occurrences | greedy |
*? | zero or more occurrences | non-greedy |
+ | one or more occurrences | greedy |
+? | one or more occurrences | non-greedy |
{m} | exactly m occurrences | greedy |
{m}? | exactly m occurrences | non-greedy |
{m,n} | at least m, but not more than n occurrences | greedy |
{m,n}? | at least m, but not more than n occurrences | non-greedy |
{m,} | m or more occurrences | greedy |
{m,}? | m or more occurrences | non-greedy |
"There Can Be Only One"
To match one and only one instance of a character, simply use the character itself in the regular expression pattern.
execute set_pattern( 'b' ) execute set_target( 'aaa' ) execute add_target( 'abc' ) execute add_target( 'abbc' ) execute add_target( 'acc' ) execute add_target( 'bbb' ) execute add_target( 'bbbb' ) select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- b aaa N (null) 0 abbc Y b 2 abc Y b 2 acc N (null) 0 bbb Y b 1 bbbb Y b 1
Zero or One
To match zero or one repetitions of a character use the '?' quantifier. For example, to match zero or one instances of the letter 'b' use 'b?'.
execute set_pattern( 'b?' ) select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- b? aaa Y (null) 1 abbc Y (null) 1 abc Y (null) 1 acc Y (null) 1 bbb Y b 1 bbbb Y b 1
These results may seem strange at first. For example, one might expect the MATCHED_VALUE for target 'abc' to be 'b', not null. To understand why this is not the case we first have to understand a few underlying principles.
- "zero" instances of a character is equivalent to a single empty string
-
all target strings are considered to have empty strings within them; e.g. 'abc' is treated like this
(empty string) a (empty string) b (empty string) c (empty string)
- Oracle displays empty strings as nulls.
Now let's consider the results. The pattern 'b?' is supposed to match zero or one occurrence of 'b', which is equivalent to saying it matches a single empty string or the character 'b'. When Oracle examines the target 'abc' for a match, it scans the equivalent string
(empty string) a (empty string) b (empty string) c (empty string)
and finds that the first empty string is a match for the pattern. It therefore reports a successful match at location 1 and returns the empty string as the result (displayed as '(null)').
The same thing happens with targets 'aaa', 'abbc', and 'acc'. Oracle never finds the 'b' at position 2 of 'abbc' because it stops looking when it finds the empty string at the beginning of the target.
For the target 'bbb', Oracle could either match the first empty string in the target or the 'b' at position 1. Since 'b?' is a greedy pattern and since 'b' is a larger result than an empty string, Oracle returns the letter 'b' as the match result.
For another perspective on empty strings, consider the following example where all empty strings in the target 'abc' are replaced with an 'X' to disclose their locations.
select 'abc' as target , regexp_replace( 'abc', 'd?', 'X' ) as replaced_target from dual ;
TARGET REPLACED_TARGET ---------- --------------- abc XaXbXcX
In the WHERE to Look tutorial in this section we will see how to use the '^' and '$' characters to match the empty strings at the beginning and end, respectively, of a target.
While trivial patterns such as 'b?' are of little use, non-trivial patterns such as 'a.?c' can be very useful and produce clearer results than 'b?'.
execute set_pattern( 'a.?c' ) select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- a.?c aaa N (null) 0 abbc N (null) 0 abc Y abc 1 acc Y acc 1 bbb N (null) 0 bbbb N (null) 0
Non-greedy matches of zero or more occurrences are specified with the '??' quantifier.
execute set_pattern( 'a.??c' ) select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- a.??c aaa N (null) 0 abbc N (null) 0 abc Y abc 1 acc Y ac 1 bbb N (null) 0 bbbb N (null) 0
Zero or More
A zero or more greedy match is specified with the '*' quantifier, its non-greedy counterpart is '*?'.
execute set_pattern( 'a.*c' ) ; select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- a.*c aaa N (null) 0 abbc Y abbc 1 abc Y abc 1 acc Y acc 1 bbb N (null) 0 bbbb N (null) 0
execute set_pattern( 'a.*?c' ) ; select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- a.*?c aaa N (null) 0 abbc Y abbc 1 abc Y abc 1 acc Y ac 1 bbb N (null) 0 bbbb N (null) 0
One or More
A one or more greedy match is specified with the '+' quantifier, its non-greedy counterpart is '+?'.
execute set_pattern( 'b+' ) ; select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- b+ aaa N (null) 0 abbc Y bb 2 abc Y b 2 acc N (null) 0 bbb Y bbb 1 bbbb Y bbbb 1
execute set_pattern( 'b+?' ) ; select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- b+? aaa N (null) 0 abbc Y b 2 abc Y b 2 acc N (null) 0 bbb Y b 1 bbbb Y b 1
Enumerated Occurrences
To specify an exact number or a numeric range of occurrences use the '{}' quantifiers.
execute set_pattern( 'b{2}' ) execute add_pattern( 'b{2}?' ) execute add_pattern( 'b{2,3}' ) execute add_pattern( 'b{2,3}?' ) execute add_pattern( 'b{2,}' ) execute add_pattern( 'b{2,}?' ) set pagesize 39 select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- b{2} aaa N (null) 0 abbc Y bb 2 abc N (null) 0 acc N (null) 0 bbb Y bb 1 bbbb Y bb 1 b{2}? aaa N (null) 0 abbc Y bb 2 abc N (null) 0 acc N (null) 0 bbb Y bb 1 bbbb Y bb 1 b{2,3} aaa N (null) 0 abbc Y bb 2 abc N (null) 0 acc N (null) 0 bbb Y bbb 1 bbbb Y bbb 1 b{2,3}? aaa N (null) 0 abbc Y bb 2 abc N (null) 0 acc N (null) 0 bbb Y bb 1 bbbb Y bb 1 b{2,} aaa N (null) 0 abbc Y bb 2 abc N (null) 0 acc N (null) 0 bbb Y bbb 1 bbbb Y bbbb 1 PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- b{2,}? aaa N (null) 0 abbc Y bb 2 abc N (null) 0 acc N (null) 0 bbb Y bb 1 bbbb Y bb 1
Note that the non-greedy patterns 'b{2}?', 'b{2,3}?', and 'b{2,}?' are of little practical use since they all yield the same result set as 'b{2}'.
Repetitions of Strings with Two or More Characters
In the previous examples we saw how to apply quantifiers to a single character. To apply them to a string of characters simply enclose the string with '(' and ')' and apply the quantifier after the ')'.
execute set_pattern( '(ab)*' ) execute add_pattern( '(ab)+' ) execute add_pattern( '(ab){2}' ) execute set_target( 'a' ) execute add_target( 'ab' ) execute add_target( 'abab' ) execute add_target( 'ababab' ) select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- (ab)* a Y (null) 1 ab Y ab 1 abab Y abab 1 ababab Y ababab 1 (ab)+ a N (null) 0 ab Y ab 1 abab Y abab 1 ababab Y ababab 1 (ab){2} a N (null) 0 ab N (null) 0 abab Y abab 1 ababab Y abab 1