In this tutorial we continue exploring the RULES part of the MODEL clause with a focus on aggregate and analytic expressions. (See Model Expressions for full syntax.)
Aggregate and analytic functions can be used on the right hand side of a rule to process multiple cell values. They can not appear on the left hand side of rules. Both types of functions can take constants, bind variables, measure columns, or expressions involving them as arguments. Beyond that, aggregate functions can also specify a cell reference to restrict which cells they operate on. For example, in this snippet
select key , num_val , m_1 from t model dimension by ( key ) measures ( num_val, 0 m_1 ) rules ( m_1[3] = SUM( NUM_VAL )[ KEY IN (1,2,3) ] ) order by key ;
KEY NUM_VAL M_1 ------ ------- ---------- 1 100 0 2 200 0 3 300 600 4 0 5 300 0 6 100 0 7 100 0 8 0 9 200 0 10 800 0
the aggregate function SUM takes the measure column NUM_VAL as an argument.
The dimension reference
[ KEY IN (1,2,3) ]
restricts the function to three specific cell values.
Here is an analytic example. Note that analytic functions are not qualified with dimension references.
select key , m_1 from t model dimension by ( key ) measures ( 0 m_1 ) rules ( m_1[any] = ROW_NUMBER() OVER ( ORDER BY KEY DESC ) ) order by key ;
KEY M_1 ------ ---------- 1 10 2 9 3 8 4 7 5 6 6 5 7 4 8 3 9 2 10 1
The following two snippets perform similar logic.
One uses analytic functions and the other aggregate functions.
By using the
CV()
and
ANY
dimension references with the aggregate functions we
see that they can produce the same results analytic functions do.
Aggregate |
Analytic |
---|---|
break on group_1 skip 1 duplicates select group_1 , num_val , m_1 , m_2 from t model dimension by ( group_1, group_2 ) measures ( num_val, 0 m_1, 0 m_2 ) rules ( m_1[any,any] = MIN( NUM_VAL ) [ CV(), ANY ] , m_2[any,any] = MAX( NUM_VAL ) [ CV(), ANY ] ) order by group_1 , group_2 ; GROUP_1 NUM_VAL M_1 M_2 ---------- ------- ---------- ---------- A 100 100 300 A 200 100 300 A 300 100 300 B 100 300 B 300 100 300 B 100 100 300 C 100 100 100 C 100 100 200 200 800 800 200 800 |
break on group_1 skip 1 duplicates select group_1 , num_val , m_1 , m_2 from t model dimension by ( group_1, group_2 ) measures ( num_val, 0 m_1, 0 m_2 ) rules ( m_1[any,any] = MIN(NUM_VAL) OVER (PARTITION BY (GROUP_1)), m_2[any,any] = MAX(NUM_VAL) OVER (PARTITION BY (GROUP_1)) ) order by group_1 , group_2 ; GROUP_1 NUM_VAL M_1 M_2 ---------- ------- ---------- ---------- A 100 100 300 A 200 100 300 A 300 100 300 B 100 300 B 300 100 300 B 100 100 300 C 100 100 100 C 100 100 200 200 800 800 200 800 |
PARTITION BY and Aggregate Functions
To see how PARTITION BY affects aggregate functions, consider the following snippets. The one on the left does not use PARTITION BY. The one on the right does. As you can see, using PARTITION BY produces different values under M_1 than not using it.
Without PARTITION BY |
With PARTITION BY |
---|---|
break on group_1 skip 1 duplicates select group_1 , group_2 , num_val , m_1 from t model dimension by ( group_1, group_2 ) measures ( num_val, 0 m_1 ) rules ( m_1[any, any] = SUM(NUM_VAL)[ANY, ANY] ) order by group_1 , group_2 , num_val ; GROUP_1 GROUP_2 NUM_VAL M_1 ---------- ---------- ------- ---------- A a1 100 2100 A a2 200 2100 A a3 300 2100 B a1 2100 B a2 300 2100 B a3 100 2100 C a1 100 2100 C a2 2100 a1 200 2100 a2 800 2100 |
break on group_1 skip 1 duplicates select group_1 , group_2 , num_val , m_1 from t model PARTITION BY ( GROUP_1 ) dimension by ( group_2 ) measures ( num_val , 0 m_1 ) rules ( m_1[any] = SUM(NUM_VAL)[ANY] ) order by group_1 , group_2 , num_val ; GROUP_1 GROUP_2 NUM_VAL M_1 ---------- ---------- ------- ---------- A a1 100 600 A a2 200 600 A a3 300 600 B a1 400 B a2 300 400 B a3 100 400 C a1 100 100 C a2 100 a1 200 1000 a2 800 1000 |
PARTITION BY and Analytic Functions
To see how PARTITION BY affects analytic functions, consider the following snippets. The one on the left does not use PARTITION BY. The one on the right does. Again, using PARTITION BY produces different values under M_1 than not using it.
Without PARTITION BY |
With PARTITION BY |
---|---|
break on group_1 skip 1 duplicates select group_1 , group_2 , m_1 from t model -- dimension by ( group_1, group_2 ) measures ( 0 m_1 ) rules ( m_1[any, any] = ROW_NUMBER() OVER(ORDER BY GROUP_1,GROUP_2) ) order by group_1 , group_2 ; GROUP_1 GROUP_2 M_1 ---------- ---------- ---------- A a1 1 A a2 2 A a3 3 B a1 4 B a2 5 B a3 6 C a1 7 C a2 8 a1 9 a2 10 |
break on group_1 skip 1 duplicates select group_1 , group_2 , m_1 from t model PARTITION BY ( GROUP_1 ) dimension by ( group_2 ) measures ( 0 m_1 ) rules ( m_1[any] = ROW_NUMBER() OVER(ORDER BY GROUP_1, GROUP_2) ) order by group_1 , group_2 ; GROUP_1 GROUP_2 M_1 ---------- ---------- ---------- A a1 1 A a2 2 A a3 3 B a1 1 B a2 2 B a3 3 C a1 1 C a2 2 a1 1 a2 2 |