These “sly” outer joins
Let it be necessary to define all ships with a known launch year (the training database “Ships”) for each class. When “for each class” is said, we already know that we must use outer join, for example, the left one:
Solution 8.6.1
SELECT Classes.class, name, launched
FROM Classes LEFT JOIN
Ships ON Classes.class = Ships.class AND
launched IS NOT NULL;
[[ column ]] |
---|
[[ value ]] |
That is, we join the Classes table with the Ships table by the ‘class’ column and choose the ships with a known launch year. And this is what we get in result, apart from anything else:
Class | Name | launched |
---|---|---|
Bismarck | NULL | NULL |
But how can that be? We’ve used ‘launched IS NOT NULL’ in the join predicate, haven’t we? And the answer to our question lies in the join predicate. Let’s get back to the definition of the left outer join:
All rows from the left table are combined with those from the right, for which the predicate value is true. If for any row from the left table there is not a single corresponding row from the right table then the values of the right table are NULLs.
In the Ships table there is no vessel of Bismarck class. That’s why we got this row, that is, the Bismarck class is in the Classes table. And what if such ship existed? Let’s add two ships of the Bismarck class into the Ships table – one with a known launch year, another with undefined:
SELECT *
FROM Ships
UNION ALL
SELECT 'B_1' AS name, 'Bismarck' AS class, 1941 AS launched
UNION ALL
SELECT 'B_2' AS name, 'Bismarck' AS class, NULL AS launched;
[[ column ]] |
---|
[[ value ]] |
Now we rewrite our solution 8.6.1, taking these new ships into account:
Solution 8.6.2
SELECT Classes.class, name, launched
FROM Classes LEFT JOIN
(SELECT *
FROM Ships
UNION ALL
SELECT 'B_1' AS name, 'Bismarck' AS class, 1941 AS launched
UNION ALL
SELECT 'B_2' AS name, 'Bismarck' AS class, NULL AS launched
) Ships ON Classes.class = Ships.class AND
launched IS NOT NULL;
[[ column ]] |
---|
[[ value ]] |
And we get the expected result: there will be only one ship of Bismarck class in the resulting set:
Class | Name | launched |
---|---|---|
Bismarck | B_1 | 1941 |
Conclusion. If you need to restrict the resulting set of the outer join use WHERE clause, which serves exactly this purpose:
Solution 8.6.3
SELECT Classes.class, name, launched
FROM Classes LEFT JOIN
Ships ON Classes.class = Ships.class
WHERE launched IS NOT NULL;
[[ column ]] |
---|
[[ value ]] |
And the join predicate only defines which rows from different tables will be concatenated in the resulting set.
Finally, I’d note that this example is not quite revealing, as the INNER JOIN would fit into this solution as well, despite the “for each class” words. But the flexibility of SQL language allows finding different variants, and stereotype usage is fully justified.
As a rule equi join is used in examples, i.e. values are equal each other in concatenated rows. This is due to practical using the foreign key joining. Similar examples you can see on the previous page. However a join predicate may be any logical expression. To illustrate this, let’s consider the following problem.
The solution is
select Income_o.* from Outcome_o right join Income_o on Outcome_o.out >= Income_o.inc
where Outcome_o.out IS NULL;
[[ column ]] |
---|
[[ value ]] |
The above solution uses outer join on the unequality Outcome_o.out >= Income_o.inc which is satisfied for the rows in the Income_o table with income that does not exceed some outcome in Outcome_o table. Moreover, outer join (right in our case) will include the rows from Income_o table for which there are no rows in the Outcome_o table that estimate the predicate as true.
These last rows give the solution to the problem. To filter these, use the fact that absent column values from Outcome_o (left) table for those rows contain NULL values. This criterium is in WHERE clause.
Obviously this problem could be solved with other ways, for example:
max + subquery
select Income_o.* from Income_o
where Income_o.inc > (select max(Outcome_o.out) from Outcome_o);
[[ column ]] |
---|
[[ value ]] |
all + subquery
select Income_o.* from Income_o
where Income_o.inc > all(select Outcome_o.out from Outcome_o);
[[ column ]] |
---|
[[ value ]] |