Once more about NULL values

The meaning of the NULL value is an absence of information or inapplicability of the current attribute in the current tuple.

You can ask:”For what to have an attribute if its value is inapplicable?” The answer to this question lies in a field of modeling of the enterprise. Let’s consider, for example, the database schema “Computers”. It represents the relational model of the association “type-supertype”. The models of computer production are the entries of the enterprise here; under this every type of production (PC, laptop or printer) represents an individual table with relations “many-to-one” with the Product table.

Such a model provides a high degree of normalization (3NF). However this is not the only way. We would have stored all information in one table, which could contain both mutual attributes for all models (for example, price) and attributes, which have sense only for models of certain types (for example, color – for printer defining). The NULL value is fully justified for this schema just in the meaning of an inapplicability of characteristic, that is NULL in the column color tells that this characteristic doesn’t refer, let us say, to PC models.

Let’s get back to the second aspect of null values – absence of information. If we decide to refuse the using of null values, we will have to offer an alternative. The natural way is using of the default value that will be substituted in the appropriate column under the absence of information. We should note that such default values should be at least so many as different data types that are supported by DBMS (integer, character strings, datetime, … ).

Let’s consider, for example, the Laptop table and the price column. Let the enterprise to be so that at the moment of information input about the models of laptops their price is sometimes unknown. In choosing the default value we should be bounded by permissible values for the price column. Data type for the column (money) makes us to be bounded by numerical values, compatible with current type and domain constraints (constraint of the CHECK type), imposed on permissible values for this column. Any positive value as a default value will cause a confusion, because it is impossible to differ “true” value of price from the substitution of the absent price. That is why we should choose zero or any negative value. And now let’s speak about disadvantages of such a substitute.

For example, let’s consider the information about the models 1298 in the Laptop table. Let’s carry out the query to learn the data:

SELECT *
FROM Laptop
WHERE model = 1298;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

Here are the results of carrying out of this query:

codemodelspeedramhdpricescreen
1129835032470011
412986006410105015
61298450641095012

Let’s consider the task of getting the average price of the model 1298. While all prices are known the solution of this task doesn’t cause any doubts:

SELECT model, AVG(price) avg_price
FROM Laptop
WHERE model = 1298
GROUP BY model;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]
modelavg_price
1298900

Let now the price of the model with code 4 is unknown. If, as it was decided earlier, we substitute an unknown value, say, with zero (UPDATE laptop SET price=0 WHERE code=4), then we will get knowingly the wrong average price value – 550.0.

If we use the NULL value, the result will be quite right – 825.0, because null values will be ignored under grouping, in the result of which the average value will be calculated only by the models with the known price (that is the average value by two models).

So, as I tried to show, the NULL value is the inherent peculiarity of the relational model, and instead of critics I offer to learn correctly working with such values.

Note

For the sake of objectivity we recommend you to read the reasonable criticism by C.J. Date concerning use of NULL values [1]. E.F. Codd [6] has offered two different types of the NULL values corresponding to just those two aspects of their applying about which there was a speech above.

Comparison of the rows containing NULLs

It is well known that a predicate is evaluated as UNKNOWN when using the comparison with NULL, i.e. neither TRUE nor FALSE. So It is not surprising that when comparing NULLs with each other, these are considered equal in some cases, and not equal in other ones. Let’s go to examples.

Begin with joining of two similar rows containing NULLs on equality of all the columns.

with A as (
select 'a' a, null b
)
, B as (
select 'a' a, null b
)
select * from A join B ON A.a=B.a and A.b=B.b;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

Only those rows will be concatenated which the predicate is evaluated as TRUE for. In our example the predicate is UNKNOWN, so we do not get any rows.

Nevertheless the intersection of the queries (as well as uniting and exception) consider these rows identical.

with A as (
select 'a' a, null b
)
, B as (
select 'a' a, null b
)
select * from A
INTERSECT
select * from B;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]
ab
aNULL

It can be concluded that NULL-values are not considered equal (nor nonequal also) in horizontal operations, but equal in vertical operations. In particular, when grouping by column including NULLs, the latter ones form one group.

Finally let’s consider a few solutions to the problem of determining the number of printers with unknown prices. The PrinterN table differs from the Printer table in that the couple of prices is set in NULL.

(1) Substraction of number of rows with known price from total number of rows.

select count(*) - count(price) qty from printerN;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

(2) Using IS NULL predicate to calculate number of NULLs in price column.

select count(*) from printerN where price IS NULL;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

(3) Grouping by price with taking the group formed by unknown price.

select count(*) from printerN group by price having price IS NULL;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]