Nulls: Nulls and Aggregate Functions

This tutorial demonstrates how aggregate functions deal with null values. Techniques for generating results that ignore nulls and results that include nulls are highlighted.

Ignoring Nulls

According to the SQL Reference Manual section on Aggregate Functions:

All aggregate functions except COUNT(*) and GROUPING ignore nulls. You can use the NVL function in the argument to an aggregate function to substitute a value for a null. COUNT never returns null, but returns either a number or zero. For all the remaining aggregate functions, if the data set contains no rows, or contains only rows with nulls as arguments to the aggregate function, then the function returns null.

This means that, given a table with values like this

GROUP_KEY  VAL
---------- ----------
Group-1    (null)
Group-1    (null)

Group-2    a
Group-2    a
Group-2    z
Group-2    z
Group-2    (null)

Group-3    A
Group-3    A
Group-3    Z

aggregate functions like MAX , MIN , and COUNT will return values that for the most part ignore nulls, like these.

select
  group_key ,
  MAX( VAL )            max_val ,
  MIN( VAL )            min_val ,
  COUNT( * )            count_all_rows ,
  COUNT( VAL )          count_val ,
  COUNT( DISTINCT VAL ) count_distinct_val
from t1
group by group_key
order by group_key ;

GROUP_KEY  MAX_VAL    MIN_VAL    COUNT_ALL_ROWS  COUNT_VAL COUNT_DISTINCT_VAL
---------- ---------- ---------- -------------- ---------- ------------------
Group-1    (null)     (null)                  2          0                  0
Group-2    z          a                       5          4                  2
Group-3    Z          A                       3          3                  2

Note how MAX_VAL contains the same results for Group-2 and Group-3, even though Group-2 contains null VAL values and Group-3 does not. Note also that only COUNT_ALL_ROWS returned a count that included null values. The other two versions of COUNT() ignored null values.

Including Nulls

For mathematical aggregate functions like AVG, MEDIAN, and SUM including nulls in the calculation is of little practical use. For aggregate functions like MAX, MIN, and COUNT(DISTINCT ...) however, we sometimes need results that take nulls into account. For example, using the same test data as above

GROUP_KEY  VAL
---------- ----------
Group-1    (null)
Group-1    (null)

Group-2    a
Group-2    a
Group-2    z
Group-2    z
Group-2    (null)

Group-3    A
Group-3    A
Group-3    Z

we may wish to produce results like these.

GROUP_KEY  MAX_VAL    MIN_VAL    COUNT_DISTINCT_VAL_1
---------- ---------- ---------- --------------------
Group-1    (null)     (null)                        1
Group-2    (null)     a                             3
Group-3    Z          A                             2

For the MAX and MIN cases it helps to take the statement "all aggregate functions except COUNT(*) and GROUPING ignore nulls" with a grain of salt. Fortunately for us there are aggregate functions in addition to COUNT(*) and GROUPING which do not ignore nulls. Two of them are the FIRST and LAST functions, which we can use as follows to give us MAX and MIN results that include nulls.

select
  group_key,
  MAX( VAL ) KEEP ( DENSE_RANK LAST  ORDER BY VAL ) max_val ,
  MIN( VAL ) KEEP ( DENSE_RANK FIRST ORDER BY VAL ) min_val
from t1
group by group_key
order by group_key ;

GROUP_KEY  MAX_VAL    MIN_VAL
---------- ---------- ----------
Group-1    (null)     (null)
Group-2    (null)     a
Group-3    Z          A

For the COUNT( DISTINCT VAL ) case two possible approaches for including nulls are demonstrated below.

select
  group_key,
  COUNT( DISTINCT DUMP(VAL) )                  count_distinct_val_1 ,
  COUNT( DISTINCT VAL ) + MAX( NVL2(VAL,0,1) ) count_distinct_val_2
from t1
group by group_key
order by group_key ;

GROUP_KEY  COUNT_DISTINCT_VAL_1 COUNT_DISTINCT_VAL_2
---------- -------------------- --------------------
Group-1                       1                    1
Group-2                       3                    3
Group-3                       2                    2

Be careful with the DUMP approach though since DUMP's output is truncated at 4000 characters. If the VAL column contained values whose DUMP output is truncated then the results can be incorrect.

DENSE_RANK and RANK

Two more aggregate functions where including nulls in the calculation may be necessary are the DENSE_RANK and RANK functions. Fortunately, as with FIRST and LAST, DENSE_RANK and RANK include nulls by default. For example, given test data like this (analytic value rankings are included for clarity)

GROUP_KEY  VAL        VAL_DENSE_RANK   VAL_RANK
---------- ---------- -------------- ----------
Group-1    (null)                  1          1
Group-1    (null)                  1          1

Group-2    a                       1          1
Group-2    a                       1          1
Group-2    z                       2          3
Group-2    z                       2          3
Group-2    (null)                  3          5

Group-3    A                       1          1
Group-3    A                       1          1
Group-3    Z                       2          3

the following results show how the aggregate versions of DENSE_RANK and RANK do not ignore nulls.

select
  group_key ,
  DENSE_RANK( NULL ) WITHIN GROUP ( ORDER BY VAL ) null_dense_rank_within_group ,
  RANK( NULL )       WITHIN GROUP ( ORDER BY VAL ) null_rank_within_group
from t1
group by group_key
order by group_key ;

GROUP_KEY  NULL_DENSE_RANK_WITHIN_GROUP NULL_RANK_WITHIN_GROUP
---------- ---------------------------- ----------------------
Group-1                               1                      1
Group-2                               3                      5
Group-3                               3                      4

Gotchas

Some people reading these two sentences from the manual

All aggregate functions except COUNT(*) and GROUPING ignore nulls. You can use the NVL function in the argument to an aggregate function to substitute a value for a null.

may infer that aggregate functions can be made to treat null values the same way they treat non-null values by simply using NVL to substitute nulls with some non-null value. A simple application of this logic can lead to trouble however. For example, say we choose to substitute all null values with a 'z', like this.

select
  group_key ,
  max( NVL( VAL, 'z' ) )            max_val ,
  min( NVL( VAL, 'z' ) )            min_val ,
  count( distinct NVL( VAL, 'z' ) ) count_distinct_val
from t1
group by group_key
order by group_key ;

GROUP_KEY  MAX_VAL    MIN_VAL    COUNT_DISTINCT_VAL
---------- ---------- ---------- ------------------
Group-1    z          z                           1
Group-2    z          a                           2
Group-3    Z          A                           2

Note how none of the columns above contain the desired results which, as you will recall, are these.

GROUP_KEY  MAX_VAL    MIN_VAL    COUNT_DISTINCT_VAL_1
---------- ---------- ---------- --------------------
Group-1    (null)     (null)                        1
Group-2    (null)     a                             3
Group-3    Z          A                             2

A simple application of NVL clearly will not do then. Taking the NVL idea a little further programmers sometimes employ more complex solutions such as this one.

select
  group_key ,
  DECODE( MAX( NVL( VAL, '~' ) ), '~', NULL, MAX( VAL ) ) max_val ,
  DECODE( MIN( NVL( VAL, '~' ) ), '~', NULL, MIN( VAL ) ) min_val ,
  COUNT( distinct NVL( VAL, '~' ) )                      count_distinct_val
from t1
group by group_key
order by group_key ;

GROUP_KEY  MAX_VAL    MIN_VAL    COUNT_DISTINCT_VAL
---------- ---------- ---------- ------------------
Group-1    (null)     (null)                      1
Group-2    (null)     a                           3
Group-3    Z          A                           2

Without some mechanism to ensure '~' and strings that sort higher than '~' never appear in VAL however, these solutions will fail if such values are ever inserted into the table. For example, given this data

insert into t1 values ( 10, 'Group-4', null  );
insert into t1 values ( 10, 'Group-4', '~'   );

insert into t1 values ( 10, 'Group-5', null  );
insert into t1 values ( 10, 'Group-5', '~~~' );

the results should be

GROUP_KEY  MAX_VAL    MIN_VAL    COUNT_DISTINCT_VAL_1
---------- ---------- ---------- --------------------
Group-4    (null)     ~                             2
Group-5    (null)     ~~~                           2

but using the NVL approach gives us these, incorrect results.

select
  group_key ,
  decode( max( nvl(val, '~' ) ), '~', null, max( val ) ) max_val ,
  decode( min( nvl(val, '~' ) ), '~', null, min( val ) ) min_val ,
  count( distinct nvl( val, '~' ) )                      count_distinct_val
from t1
where group_key in ( 'Group-4', 'Group-5' )
group by group_key
order by group_key ;

GROUP_KEY  MAX_VAL    MIN_VAL    COUNT_DISTINCT_VAL
---------- ---------- ---------- ------------------
Group-4    (null)     (null)                      1
Group-5    ~~~        (null)                      2

To avoid these gotchas simply use the non-NVL alternatives presented under "Including Nulls" above.

SQL Snippets ™: Tutorials for Oracle Developers

Nulls

Nulls and Aggregate Functions

Ignoring Nulls

Including Nulls

DENSE_RANK and RANK

Gotchas

Linking to SQL Snippets ™

Revision Notes