Exercise #51

Find the names of the ships having the largest amount of guns among all the ships with the same displacement (taking into account Outcomes table).

Solution 3.8.1

It is not quite optimal, and what is more, it contains an error.

SELECT name
FROM (SELECT O.ship AS name, numGuns, displacement
    FROM Outcomes O 
        INNER JOIN Classes C ON O.ship = C.class 
                            AND O.ship NOT IN (SELECT name
                                                    FROM Ships
                                                    )
    UNION
    SELECT S.name AS name, numGuns, displacement
    FROM Ships S 
        INNER JOIN Classes C ON S.class = C.class
    ) OS 
    INNER JOIN
    (SELECT MAX(numGuns) AS MaxNumGuns, displacement
    FROM Outcomes O 
        INNER JOIN Classes C ON O.ship = C.class 
            AND O.ship NOT IN (SELECT name
                                FROM Ships
                                )
    GROUP BY displacement
    UNION
    SELECT MAX(numGuns) AS MaxNumGuns, displacement
    FROM Ships S 
        INNER JOIN Classes C ON S.class = C.class
    GROUP BY displacement
    ) GD ON OS.numGuns = GD.MaxNumGuns 
        AND OS.displacement = GD.displacement;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

In the FROM clause of the above solution two subqueries are joined. The first one determines names, number of guns and displacement of all the ships from the database. These ships are collected from two tables - Ships and Outcomes (leading ships). In so doing, an incorrect and redundant check for duplicates is performed:

O.ship NOT IN (SELECT name
               FROM Ships
              )

Why incorrect? Because it retains duplicates all the same, tallying up a lead ship as many times as it takes part in battles. And redundant is it, because the UNION clause will remove duplicates anyway. It proved quite useful in this case, the query, though being not optimal, yielding the result as expected against the algorythm.

The second subquery in the join statement determines the maximun number of guns for each displacement value of the available ships, and here, like we did before, these values are calculated separately for the ships from Ships and for the lead ships from Outcomes, followed by unioning.

Joining is performed across matching numbers of guns and displacements in the rows of the subqueries.

Logic of building up the solution is quite correct, but the implementation is not. To prove it, one usually resorts to a counterinstance. Putting it in other words, we will present an instance of data for which the query yields a wrong result. Well, let there be ships with 40 000 tons displacement and maximum number of guns 16 only in the Ships table, and the lead ship with 40 000 tons displacement and maximum number of guns 17 only in the Outcomes. Then the second join subquery will produce two rows:

1640000
1740000

since they are not duplicates, these two rows will be both present in the resulting set. Upon joining, we’ll get not only the ships with maximum number of guns for a given displacement - 17, but the ships rigged with 16 guns. Can you recognize the mistake? We’ve come across it before: firstly join operation should be done, and then grouping.

T&S

To solve the problem on SQL-EX.RU