Exercise #(-2)

It is necessary to determine year when maximum quantity of ships had been launched for each country. If there were several diffirent years with maximum value, the minimal one must be returned. On the output: country, quantity of ships, year.

Solution 3.6.1

Here is a typical beginner’s solution:

SELECT country, MAX(x), MIN(launched)
FROM (SELECT country, COUNT(*) x , launched
    FROM Ships b, Classes a
    WHERE a.class = b.class
    GROUP BY country, launched
    ) s
WHERE launched = ANY(SELECT MIN(launched)
                    FROM Ships bb, Classes aa
                    WHERE bb.class = aa.class
                    GROUP BY country, launched
                    )
GROUP BY country;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

The subquery in FROM clause determines quantity of rows for each unique pair {country, year of launch}. In the terms of the data domain this means that it is determined quantity of ships that had been launched by every single country in every year. Let the resultant set of the subquery “s” will be the following:

countryxlaunched
Gt.Britain61916
Gt.Britain11917
Japan11913
Japan21914
Japan21915
Japan11916
Japan11941
Japan11942
USA11920
USA11921
USA31941
USA21943
USA21944

Next, in the WHERE clause it will be selected rows with year of launch that matches any year from this subquery:

SELECT MIN(launched)
FROM Ships bb, Classes aa
WHERE bb.class = aa.class
GROUP BY country, launched;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

What this subquery gives us? It gives us all years from the table above, because the data is grouped by country and year. As result this subquery doesn’t take any effect to the data set, therefore it is unnecessary. It seemes that author want to express the minimal year condition in this expression. But the minimal year must be considered as minumum of those years in which maximum ships’ quantity for this country had been launched.

Removing the “launched” column from grouping isn’t enough to correct this mistake.

SELECT MIN(launched)
FROM Ships aa, Classes bb
WHERE bb.class = aa.class
GROUP BY country;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

as a result we will get minimal years by every country:

1916
1913
1920

Then rows that satisfies predicate would look like this:

countryxlaunched
Gt.Britain61916
Japan11913
Japan11916
USA11920

Thereby we have already lost correct rows for Japan and USA. We should take notice of this row for Japan:

Japan11916

only by the reason that 1916th year concurred with minimal year for the USA. The following code doesn’t have any sense. But it has mistake though. In the main query:

SELECT country, MAX(x), MIN(launched)

GROUP BY country

we have grouping by country with determining of two aggregated factors — maximum of ships’ quatities and minimum of years. Using of grouping in this case leads to mistake. Indeed, rows that we need could already be found in table which in the FROM clause (for instance {Gt.Britain, 6, 1916}). We do not need any grouping here, we only need to filter out wrong rows. As a result of grouping only one row will be remained. It have statistic rates for whole group. In this case the maximum and the minimum gain in different rows of the group. This is clearly visible at USA’s ships table. In this one minimal year does not correspond maximum value (x=1). The maximum value (x=3) is reached at 1941th year. That’s why such grouping may return right result only if all “x” values for country are equal.

Everything in this solution is reversed. Nevertheless, let’s try to correct it by finding out the reasons of mistakes and delusions.

Correlated subquery is needed for binding of year and country. It can be used in WHERE clause (AND aa.country = s.country):

WHERE launched = ANY(SELECT MIN(launched)
FROM Ships bb, Classes aa
WHERE aa.class = bb.class AND
aa.country = s.country
GROUP BY country
)

That is correct, but don’t take any effect for now, except maybe it excludes wrong row:

Japan11916

For advance we need to calculate minimal year among ones with maximum quantity of ships for each country. Maximum quantity of ships is primary here. If we filtered only by minimal year we can lose right rows. That’s why we need to count quantity of ships, not a minimal year:

WHERE x >= ALL(SELECT count(launched)
FROM Ships bb, Classes aa
WHERE bb.class = aa.class AND
s.country=aa.country
GROUP BY country, launched
)

Take a note at predicate >= ALL which gives us maximal value of “x”. Let’s rewrite whole query with taking into account all we have said about grouping in the main query.

Solution 3.6.2

SELECT country, x, launched
FROM (SELECT country, COUNT(*) x , launched
        FROM Ships b, Classes a
        WHERE a.class = b.class
        GROUP BY country, launched
      ) s
WHERE x >= ALL(SELECT COUNT(launched)
                FROM Ships bb, Classes aa
                WHERE bb.class = aa.class 
                    AND s.country=aa.country
                GROUP BY country, launched
                );
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

Is this all? Not at all. If maximum value would be gained in several different years the row for every such year will occur in the result. But we need to output the row with minimal year by the data. As it was noted above, this is the certain case when grouping is valid: all “x” values for country are equal and maximal.

Solution 3.6.3

Using join instead of correlated subquery.

SELECT a.country, a.numShips AS Qty, MIN(launched) AS Year
FROM (SELECT country, COUNT(*) AS numShips, launched
        FROM Classes INNER JOIN
        Ships ON Classes.class = Ships.class
        GROUP BY country, launched
        ) AS a 
    INNER JOIN
        (SELECT a.country, MAX(a.numShips) AS Qty
         FROM (SELECT country, COUNT(*) AS numShips
                FROM Classes INNER JOIN
                Ships ON Classes.class = Ships.class
                GROUP BY country, launched
                ) AS a
         GROUP BY country) AS b
    ON a.country = b.country 
        AND a.numShips = b.Qty
GROUP BY a.country, a.numShips;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

Inner join by country and quantity of ships from two subqueries is in the FROM clause. The first subquery determines country and quantity of ships that has been launched in each year for it. The second one contains similar query in the FROM clause but filters only pair {country, quantity of ships} that has maximum quantity of ships that has been launched for one year.

As a result of this join the tuple {country, maximum quantity of ships} is appended by year of launch of that quantity. Finally, data get grouped to determine minimal year if maximum occured more then one for single country similar to as it done in solution 3.6.2.

Solution 3.6.4

Using HAVING clause.

SELECT country, QTY, MIN(launched)
FROM (SELECT country, launched, COUNT(name) QTY
    FROM Classes c JOIN
    Ships s ON c.class = s.class
    GROUP BY country,launched
    HAVING COUNT(name) = (SELECT MAX(qty)
                        FROM (SELECT country,launched,COUNT(name) qty
                        FROM Classes c1 
                            JOIN Ships s1 ON c1.class = s1.class
                        WHERE country = c.country
                        GROUP BY country,launched
                            )
    )T
GROUP BY t.qty, t.country;
🚫
[[ error ]]
[[ column ]]
NULL [[ value ]]

If the subquery of the FROM clause rows {country, launch year, quantity of ships} are determined first. Then HAVING clause filters only rows with quantity of ships that is equal to maximum quantity of ships for this country. Take note that subquery in this predicate is correlated.

WHERE country = c.country

That is why MAX(qty) relates to certain country from the main query and is not a global maximum. At last, it determines minimal year for each combination {country, maximum quantity of ships}.

Can we move to the next task? No we can’t. All of considered variants of solutions contain one type of mistake. We offer you to find it by your own.

If you can’t make it you are free to look at T&S.

To solve a problem on SQL-EX.RU