In this tutorial we continue examining ways to specify WHAT text to match. Patterns for strings of two or more characters is explored.
Matching Strings with Two or More Characters
To matching a string with two or more characters simply use the string itself as the regular expression.
execute set_pattern( 'bc' ); execute set_target( 'aaa' ); execute add_target( 'ab' ); execute add_target( 'abc' ); execute add_target( 'ac' ); execute add_target( 'bcd' ); execute add_target( 'cba' ); select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- bc aaa N (null) 0 ab N (null) 0 abc Y bc 2 ac N (null) 0 bcd Y bc 1 cba N (null) 0
OR Conditions and Grouping
Use the "|" character to specify an OR condition between strings. For example, the pattern "ab|ba" matches either an "ab" OR a "ba".
execute set_pattern( 'ab|ba' ); select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- ab|ba aaa N (null) 0 ab Y ab 1 abc Y ab 1 ac N (null) 0 bcd N (null) 0 cba Y ba 2
To avoid ambiguity, it helps to use "(" and ")" to group the terms of OR expressions. For example, does the pattern "a|bc" mean
("a" OR "b") followed by "c"
or does it mean
"a" OR ("bc")?
Using parentheses removes any ambiguity, as we see in the next three examples.
execute set_pattern( 'a|bc' ); select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- a|bc aaa Y a 1 ab Y a 1 abc Y a 1 ac Y a 1 bcd Y bc 1 cba Y a 3
execute set_pattern( 'a|(bc)' ); select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- a|(bc) aaa Y a 1 ab Y a 1 abc Y a 1 ac Y a 1 bcd Y bc 1 cba Y a 3
execute set_pattern( '(a|b)c' ); select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- (a|b)c aaa N (null) 0 ab N (null) 0 abc Y bc 2 ac Y ac 1 bcd Y bc 1 cba N (null) 0
Once you have grouped a subexpression you can reference the string it matched by using a backreference in other sections of the pattern. The backreference "\1" refers to the result of the first bracketed subexpression in the pattern, "\2" refers to the second, and so on up to "\9".
In this example, the backreference "\1" matches the string "abc".
execute set_pattern( '(abc),\1' ); execute set_target( 'abc,' ); execute add_target( 'abc,abc' ); execute add_target( 'abc,def' ); execute add_target( 'def,def' ); select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- (abc),\1 abc, N (null) 0 abc,abc Y abc,abc 1 abc,def N (null) 0 def,def N (null) 0
Here, the backreference "\1" matches the string "abc" in one case and "def" in another.
execute set_pattern( '(...),\1' ); select * from test_results ;
PATTERN TARGET MATCH MATCHED_VALUE POSITION --------------- ---------- ----- --------------- -------- (...),\1 abc, N (null) 0 abc,abc Y abc,abc 1 abc,def N (null) 0 def,def Y def,def 1
Ignoring White Space
To ignore white space in a regular expression, use the pattern matching modifier "x" in REGEXP conditions and functions.
execute set_target( 'abc' ) select 'abc' as pattern , null as modifier , target , regexp_substr( target, 'abc' ) as matched_value from targets union select 'abc' as pattern , 'x' as modifier , target , regexp_substr( target, 'abc', 1, 1, 'x' ) as matched_value from targets union select 'a b c' as pattern, null as modifier , target , regexp_substr( target, 'a b c' ) as matched_value from targets union select 'a b c' as pattern, 'x' as modifier , target , regexp_substr( target, 'a b c', 1, 1, 'x' ) as matched_value from targets order by 1 desc , 2 desc , 3 ;
PATTERN MODIFIER TARGET MATCHED_VALUE --------------- -------- ---------- --------------- abc (null) abc abc x abc abc a b c (null) abc (null) x abc abc
Note that the "x" modifier ignores white space only in the pattern, *not* the search target.
execute set_target( 'a b c' ) select 'abc' as pattern , null as modifier , target , regexp_substr( target, 'abc' ) as matched_value from targets union select 'abc' as pattern , 'x' as modifier , target , regexp_substr( target, 'abc', 1, 1, 'x' ) as matched_value from targets union select 'a b c' as pattern, null as modifier , target , regexp_substr( target, 'a b c' ) as matched_value from targets union select 'a b c' as pattern, 'x' as modifier , target , regexp_substr( target, 'a b c', 1, 1, 'x' ) as matched_value from targets order by 1 desc , 2 desc , 3 ;
PATTERN MODIFIER TARGET MATCHED_VALUE --------------- -------- ---------- --------------- abc (null) a b c (null) x a b c (null) a b c (null) a b c a b c x a b c (null)
Excluding Strings
Techniques for excluding a string from a pattern match vary depending on the requirements.
To find all targets that do not contain a string like "ab", use the "not" condition.
execute set_target( 'abb' ); execute add_target( 'abc' ); execute add_target( 'acc' ); select target from targets where not regexp_like( target, 'ab' );
TARGET ---------- acc
To match a pattern like 'a..' where the second and third characters after an "a" are anything other than "bc" use two WHERE conditions.
select target from targets where regexp_like( target, 'a..' ) and not regexp_like( target, 'abc' );
TARGET ---------- abb acc