After I proposed the preceding solution at the comments section of the So, in your opinion ... blog post I got the following feedback. My responses follow each feedback point.
My comments ...
Assuming I'm not missing something, then I think Oracle's efforts would be better spent developing a built in string aggregation function than adding another feature for doing pattern matching.
Alberto Dell'Era said....
But the paper enhancement would not only detect the pattern, it would also (and most importantly) enable you to compute meeasures (sic) on it (location, max/min values, etc), etcetera.
Good point. Calculating location, max, min, etc. values is, however, easily accomplished with my string aggregation approach. Here are two examples.
variable pattern varchar2(50) execute :pattern := 'A*D[DA]*U[UA]*D[DA]*U[UA]*' -- Measures for a fuzzy "W" Pattern with spm as ( select stock_symbol , regexp_instr( MOVEMENT_STRING_UDA, :pattern ) start_day , length(regexp_substr( MOVEMENT_STRING_UDA, :pattern ) ) duration from stock_price_movement where day = 11 and regexp_like( MOVEMENT_STRING_UDA, :pattern ) ) select spm.stock_symbol , spm.start_day , spm.duration , min( spm_range.price ) min_price , max( spm_range.price ) max_price from spm inner join stock_price_movement spm_range on ( spm_range.stock_symbol = spm.stock_symbol and spm_range.day between spm.start_day and spm.start_day + spm.duration - 1 ) group by spm.stock_symbol , spm.start_day , spm.duration order by spm.stock_symbol ;
STOCK_SYMB START_DAY DURATION MIN_PRICE MAX_PRICE ---------- ---------- ---------- ---------- ---------- W1 1 11 7 10 W2 1 11 5 9
execute :pattern := '((DA)|(AD)|(DD))((UA)|(AU)|(UU))' ; -- Measures for a 4 Day fuzzy "V" pattern /
STOCK_SYMB START_DAY DURATION MIN_PRICE MAX_PRICE ---------- ---------- ---------- ---------- ---------- V1 5 4 5 7 V2 4 4 6 8 W1 3 4 7 9 W2 3 4 7 9
Alberto Dell'Era said....
Also, I've noticed that they propose to emit the measures as soon as the pattern(s) is(are) discovered, instead of first aggregating (pivoting) all the rows and then search for the pattern(s) - which would be a dramatical performance improvement for big data sets.
Emitting measures as soon as a pattern is discovered may not necessarily yield dramatic performance improvements over the string aggregation approach. In many situations, e.g. stock patterns, fraud detection, the target pattern will only be found in a small percentage of cases. The majority of cases will involve full scans of the data series without emitting any measures at all. Even in the cases where the target pattern does exists there is a 50/50 chance the pattern will appear in the second half of the data so the savings will not be dramatic here either.
Consider also that, to emit measures as soon as a pattern is discovered, the proposed approach may need to maintain intermediate results for each measure after each data point is fetched just in case a match is found. When no match is found (which can be most of the time) the results of the intermediate calculations will simply be discarded and all that overhead will have been performed for nothing.
The string aggregation approach, on the other hand, only looks for measures *after* a pattern match is found.
One last point. The string aggregation approach uses REGEXP_LIKE to find a pattern match. I expect REGEXP_LIKE stops scanning when a match is found just as the new feature would.