Exercise #37

Find the classes including only one ship in the database (taking into account Outcomes table).

Solution 3.2.1

Here is one of the queries the checking system rejects:

SELECT class
FROM Ships
GROUP BY class
HAVING COUNT(name) = 1
UNION
SELECT class
FROM Classes c, Outcomes o
WHERE c.class = o.ship AND
NOT EXISTS (SELECT 'x'
FROM Ships s
WHERE o.ship = s.class
);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

The first request in Union counts the ships of each class in the Ships table leaving only the classes that have one ship in the resulting set. The second request defines classes in which the leading ship is in the Outcomes table on condition that ships of that class are not in the Ships table.

Let’s examine the following data example for which this request will provide (give) the wrong result.

Everyone who has accomplished tasks based on this data scheme (“Ships”) knows what the Bismark is. It’s a leading ship which is not included in the Ships table. Now let’s imagine that another ship of Bismark’s class is included in the Ships table, for example, the Tirpitz.

Then the first request will return Bismark’s class as the Ships table contains only one ship of that class. The second request won’t return Bismark because the predicate:

NOT EXISTS (SELECT 'x'
FROM Ships s
WHERE o.ship = s.class
)

for the Bismark ship in the Outcomes table will be evaluated as FALSE. As a result of the conjunction of these queries, we’ll get the Bismark class in the imprint data of the whole query.

It’s clear to everybody who has followed the course of discussion attentively that there are TWO ships in the Bismark class in the database. That means, this class must not be included in the result set of the query.

To check it, add the following row into the basic database:

INSERT INTO Ships VALUES('Tirpitz', 'Bismark', 1940);

Tip

All the basic teaching databases you can download from sqlbooks.ru.

Solution 3.2.2

The next solution was built by one of the visitors of the site after receiving the above explanation. It also gives the correct result for main database.

SELECT class
FROM Ships sh
WHERE NOT EXISTS (SELECT ship
FROM Outcomes
WHERE ship = sh.class
)
GROUP BY class
HAVING COUNT(*) = 1
UNION
SELECT ship
FROM Outcomes s
WHERE EXISTS (SELECT class
FROM Classes
WHERE class = s.ship
) AND
NOT EXISTS (SELECT class
FROM Ships
WHERE class = s.ship
);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Here two queries were combined. The second query selects from the Outcomes table on condition that there exists no other ship of the given leading ship class in the Ships table.

The first query selects all ships from the Ships table except those whose leading ship is present in the Outcomes table. Next, the grouping by class was executed and ship classes with more than one ship are eliminated with the HAVING clause.

Thus, it is assumed that if the leading ship is in the Outcomes table, this class has at least two ships and, therefore, this class does not meet the conditions of the problem. That is a mistake, because nothing implies that the Ships table cannot contain a leading ship. So if a class has one ship in the database, and this ship is a head one and is present in both of the considered tables, then the solution 3.2.2 mistakenly ignores this class.

Solution 3.2.3

Look now, how we can more easily write a query that contains a similar mistake:

SELECT class
FROM (SELECT class
FROM Ships
UNION ALL
SELECT ship
FROM Outcomes o
WHERE o.ship IN(SELECT class
FROM Classes
)
) AS cl
GROUP BY class
HAVING COUNT(class) = 1;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

The idea: Ship classes from the Ships table are selected in the subquery and merged with the leading ships of the Outcomes table with preservation of duplicates (UNION ALL). This uses the fact that the name of the leading ship coincides with the class name (SELECT Ship) (!!!). The fact that the duplicates are stored is absolutely correct, because we get one row for a class for any number of ships in this class otherwise. Then grouping by class, and picking the classes that contain a single ship are being done. The solution looks much shorter and clearer than the solution 3.2.2. Correction it will be easier, but will have to be corrected, because the solution will give an incorrect result, if the leadIng ship is present both in the Ships table and in the Outcomes table, with the result that we count it twice.

Important

It is worth paying attention to the fruitful idea of this decision - first to combine all the ships, and only then carry out the grouping by classes.

Solution 3.2.4

Approach based on unions is not unique. The following solution uses joins.

SELECT Classes.class
FROM Outcomes RIGHT OUTER JOIN
Classes ON Outcomes.ship = Classes.class LEFT OUTER JOIN
Ships ON Classes.class = Ships.class
GROUP BY Classes.class
HAVING (COUNT(COALESCE (Outcomes.ship, Ships.name)) = 1);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Right join of Outcomes and Classes tables gives us the leading ships, with so doing the Outcomes.ship column will contain NULL value if the leading ship will not be in the Outcomes table. Then is executed left join of Classes and Ships tables on foreign key. The Ships.name column will contain a NULL value if the class has no ships in the Ships table. The resulting record set is grouped by class, and then the filtering predicate is used

COUNT(COALESCE (Outcomes.ship, Ships.name)) = 1

Let us stop on this elegant method.

Note

The author is not ironic when he says “beautiful”, “elegant”, etc., on the wrong approach to solving the exercise. Here, most of the queries are written by professionals, fluent in the language of SQL. The analysis such solutions is a good school for beginners. The mistakes in these solutions are usually associated with neglect of a particular subject area, and are often easily corrected by purely cosmetic means.

So to make things completely clear, we give examples of four possible rows (taken from the available database except the last case), which result from the join. Here they are:

Shipclassname
BismarckBismarckNULL
TennesseeTennesseeCalifornia
TennesseeTennesseeTennessee
NULLYamatoMusashi
NULLYamatoYamato
NULLClass_1NULL

NULL in the column name for the Bismarck class means that the lead ship is only available in the Outcomes table. Ships “California” and “Tennessee” of Tennessee class are in Ships table, in this case, the leading ship is also in the Outcomes table. In the third case, the two ships of class Yamato present in the Ships table, the Outcomes table does not have leading ship of the class. For the fourth case of Class_1, the class has no ships in the database.

We return to the predicate. Function COALESCE (Outcomes.ship, Ships.name) returns the first non-NULL value of its arguments, or NULL, if both arguments are NULL values. More information about the COALESCE function can be found in paragraph 5.10.

The COUNT aggregate function with an argument counts non-NULL values in a group. The result for the Bismark class is 1; the result for the Tennessee and Yamato classes is 2 and for the Class_1 is 0. Only the Bismark of the four classes is present in the resulting set because the predicate is true for this class only.

This solution is free from the mistake of the previous solution 3.2.3: a leading ship contained both in the Ships table and the Outcomes table will be counted just once, because the only row will be present in the resulting set.

Important

A question. Is the last assertion always true for our database? Provide an example of data when it is not true. Incidentally this adds another error to the solution.

Furthermore the same error as in the example 3.2.1  rises in this case. If the row mentioned in the example is inserted into the database, just one row will be produced for two Bismark class ships as a result of the union:

ShipClassname
BismarckBismarckTirpitz

If you have not yet discovered the mistake mentioned in the inset, look up the T&S.

T&S

To solve the problem on SQL-EX.RU