Pattern Matching Over Rows: Measures

After I proposed the preceding solution at the comments section of the So, in your opinion ... blog post I got the following feedback. My responses follow each feedback point.

Point 1

My comments ...
Assuming I'm not missing something, then I think Oracle's efforts would be better spent developing a built in string aggregation function than adding another feature for doing pattern matching.

Alberto Dell'Era said....
But the paper enhancement would not only detect the pattern, it would also (and most importantly) enable you to compute meeasures (sic) on it (location, max/min values, etc), etcetera.

Good point. Calculating location, max, min, etc. values is, however, easily accomplished with my string aggregation approach. Here are two examples.

variable pattern varchar2(50)

execute :pattern := 'A*D[DA]*U[UA]*D[DA]*U[UA]*'

-- Measures for a fuzzy "W" Pattern

with spm as
  ( select
      stock_symbol ,
      regexp_instr( MOVEMENT_STRING_UDA, :pattern ) start_day ,
      length(regexp_substr( MOVEMENT_STRING_UDA, :pattern ) ) duration
    from   stock_price_movement
    where  day = 11
    and    regexp_like( MOVEMENT_STRING_UDA, :pattern )
  )
select
  spm.stock_symbol ,
  spm.start_day ,
  spm.duration ,
  min( spm_range.price ) min_price ,
  max( spm_range.price ) max_price
from
  spm inner join stock_price_movement spm_range on
    ( spm_range.stock_symbol = spm.stock_symbol and
      spm_range.day between
        spm.start_day and spm.start_day + spm.duration - 1
    )
group by
  spm.stock_symbol ,
  spm.start_day ,
  spm.duration
order by spm.stock_symbol
;

STOCK_SYMB  START_DAY   DURATION  MIN_PRICE  MAX_PRICE
---------- ---------- ---------- ---------- ----------
W1                  1         11          7         10
W2                  1         11          5          9

execute :pattern := '((DA)|(AD)|(DD))((UA)|(AU)|(UU))' ;

-- Measures for a 4 Day fuzzy "V" pattern
/

STOCK_SYMB  START_DAY   DURATION  MIN_PRICE  MAX_PRICE
---------- ---------- ---------- ---------- ----------
V1                  5          4          5          7
V2                  4          4          6          8
W1                  3          4          7          9
W2                  3          4          7          9

Point 2

Alberto Dell'Era said....
Also, I've noticed that they propose to emit the measures as soon as the pattern(s) is(are) discovered, instead of first aggregating (pivoting) all the rows and then search for the pattern(s) - which would be a dramatical performance improvement for big data sets.

Emitting measures as soon as a pattern is discovered may not necessarily yield dramatic performance improvements over the string aggregation approach. In many situations, e.g. stock patterns, fraud detection, the target pattern will only be found in a small percentage of cases. The majority of cases will involve full scans of the data series without emitting any measures at all. Even in the cases where the target pattern does exists there is a 50/50 chance the pattern will appear in the second half of the data so the savings will not be dramatic here either.

Consider also that, to emit measures as soon as a pattern is discovered, the proposed approach may need to maintain intermediate results for each measure after each data point is fetched just in case a match is found. When no match is found (which can be most of the time) the results of the intermediate calculations will simply be discarded and all that overhead will have been performed for nothing.

The string aggregation approach, on the other hand, only looks for measures *after* a pattern match is found.

One last point. The string aggregation approach uses REGEXP_LIKE to find a pattern match. I expect REGEXP_LIKE stops scanning when a match is found just as the new feature would.

SQL Snippets ™: Tutorials for Oracle Developers

Pattern Matching Over Rows

Measures

Point 1

Point 2

Linking to SQL Snippets ™