Exercise #70

Point out the battles in which at least three ships from the same country took part.

Solution 3.7.1

SELECT AA.name AS bat
FROM (SELECT O.battle AS name, C.country, COUNT(O.ship) AS cnt
FROM Outcomes O, Ships S, Classes C
WHERE O.ship = S.name AND
C.class = S.class
GROUP BY O.battle, C.country
) AA
WHERE AA.cnt >= 3;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

This query may be called a “first approximation” to the solution. All required tables are joined by WHERE clause. After that the battle and the country (from the Classes table) are determined for ships from the Outcomes table. Then the result have got grouped by battle with following selection of ships by count.

The mistake covers in fact that we do not take into account ships which is absent in the Ships table. This mistake occurs beacuse we use inner joins. For now its should be clear for reader that we do not take into account leading ships. The class of leading ship can be founded not only from Ships table, but also directly from Classes table. Thereupon the owner country can be founded. Now lets consider solutions containing attempts to take into account this peculiarity.

Solution 3.7.2

SELECT bat
FROM (SELECT DISTINCT d.battle AS bat, a.country, count(d.ship) AS s
      FROM Outcomes d, Ships b, Classes a
      WHERE d.ship = b.name AND
            b.class=a.class
      GROUP BY d.battle, a.country
      UNION
      SELECT DISTINCT d.battle as bat, a.country, count(d.ship) AS s
      FROM Outcomes d, Classes a
      WHERE d.ship = a.class AND
            d.ship NOT IN (SELECT name
                           FROM Ships
                           )
      GROUP BY d.battle, a.country
      ) AS t1
WHERE s > 2;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

The typical mistake of many beginners is in the order of operations under the resultant set. At first, they implement grouping and then they unioning. In this case the mistake of solution 3.7.1 is absent, but this solution do not returns country, which have three ships taking part in one battle, two of its` are in the table Ships and one (leading ship) is in the Outcomes table.

There was time the checking system accepted the wrong solution:

Solution 3.7.3

SELECT battle
FROM Classes c LEFT JOIN
Ships s ON c.class = s.class INNER JOIN
Outcomes o ON o.ship = s.name OR
c.class = o.ship
GROUP BY battle, country
HAVING COUNT(ship) > 3;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Look at HAVING COUNT(ship) > 3. Using the right predicate with >=3 clause was making it incorrect (and that’s what it is, in fact). Adjusting the solution allowed to find the hole in testing which was removed.

So, the query combines classes with ships from the Ships table to define the country of the vessel. The left conjunction (LEFT JOIN) is purposed to avoid loosing a class if its ships are not in the Ships table. This class (and not only this) will be necessary to take into account the leading ships from the Outcomes table, and that is done in the next (inner) combination. Its predicate

ON o.ship = s.name OR c.class = o.ship

will form a string, in which the ship column will contain the name of a ship that took part in battles, if its name coincides with the name of a ship in a known class from the Ships table, OR if its name coincides with the class name (a leading ship). If a vessel didn’t take part in a battle, the ship column will contain NULL. Then grouping by a couple of attributes {battle, country} with HAVING COUNT(ship) >= 3 clause is performed. This allows to choose only those countries which used more than two vessels for a fight. Notice that COUNT function will work with NULL values in the ships column correctly.

Important

You can read about the usage of COUNT(*) and COUNT(<column name>) in the item 5.5 on the site.

That “OR” of the predicate (1) has the basic mistake of this query. If the same leading ship is in the Outcomes and Ships tables it will be counted twice for one battle. It can be seen from the following query:

SELECT battle, country, ship, COUNT(*) qty
FROM Classes c LEFT JOIN
Ships s ON c.class = s.class INNER JOIN
Outcomes o ON o.ship = s.name OR
c.class = o.ship
GROUP BY battle, country, ship;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

I post here only one incorrect string of the result:

Battlecountryshipqty
Surigao StraitUSATennessee2

It’s an obvious error because the same ship can’t be mentioned twice for one battle (Surigao Strait) and that’s restricted by the primary key on the Outcomes table.

Let’s note that the query we examine, as well as the solutions 3.7.1 and 3.7.2, has another mistake which even was described in site’s FAQ. As the grouping is executed by a pair of attributes { battle, country}, the battle will be printed out more than once if 3 or more ships from each country participated in it.

There’s only one question left. Why was this query admitted by the system while having 3 remarked mistakes (>3 instead of >=3, incorrect combination and the possibility of getting duplicates)?

Let’s try to clarify this. There were no battles in the database for which the task condition would be fulfilled. In the correct solution an empty set was printed out. That’s why the wrong ships’ number increase did not work with the right criteria (>=3), as the query passed out Surigao Strait battle, though in fact 2 ships from the USA took part in it. And the >3 clause gave an empty set again.

In the checking base there were two sets of 3 ships from different countries for blocking the solutions with non-excluded duplicates for one battle. At the same time in one set the leading ship was included in both tables (Outcomes and Ships). For this set the query we examine was giving a wrong number of 4 ships, and a right one for the other - 3. That’s why the condition in the HAVING predicate - “> 3” - was printing out only one battle, solving the problem with duplicates in the most unusual way.

The SQL world is full of surprises; the more mistakes you make, the greater the possibility of coincidental results :).

An empty set for the solution of this task on the main database had repeatedly caused disapproval. That’s why I added data to the main database blocking the considered solution.

Solution 3.7.4

SELECT DISTINCT battle
FROM (SELECT battle, country
FROM (SELECT battle, country
FROM Outcomes INNER JOIN
Classes ON ship = class
UNION
SELECT battle, country
FROM Outcomes o INNER JOIN
Ships s ON o.ship = s.name INNER JOIN
Classes c ON s.class = c.class
) x
GROUP BY battle, country
HAVING COUNT(*) > 2
) y;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

In the internal sub-query the two queries are connected. The first of these:

SELECT battle, country
FROM Outcomes INNER JOIN
Classes ON ship = class;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
determines the country and battles which leading ships are take part. The second one:

SELECT battle, country
FROM Outcomes o INNER JOIN
Ships s ON o.ship = s.name INNER JOIN
Classes c ON s.class = c.class;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
determines the country and battles for ships in the Ships table. Joining with Classes table is necessary for recognition of the owner country.

Using of the UNION clause leads to the removal of the duplicates. From one side, it seemes rightly, because the leading ship might be present in the both - the Outcomes and the Ships table. But from the other side if we remove the duplicates the resultant set will have only unique pairs {battle, country}. This means that for any count of the ships from one country for one battle will be only the one row. As result the following grouping will be superfluous, like the HAVING clause.

The first idea is in the using of UNION ALL instead of UNION, which means we take into account duplicates too. But, as we know from previous consideration, for one ship which is present in the both tables, we get two rows.

What should we do? Author offers two methods. The first one: we keep UNION, but count not the countries but the ships. Then duplicates would be eliminated rightly. In the second method UNION ALL clause is using, but in this case we need to check that the ship is present in the one table would not be present in the other, and count it only once.

Which of two methods is more preferable not only depends on our predilection, but on the execution plan`s cost. We offer you to appraise it by your own, in process of solving this task by both of the methods.

T&S

To solve the problem on SQL-EX.RU