Exercise #(-2)

It is necessary to determine year when maximum quantity of ships had been launched for each country. If there were several diffirent years with maximum value, the minimal one must be returned. On the output: country, quantity of ships, year.

Solution 3.6.1. Here is a typical beginner’s solution:

SELECT country, MAX(x), MIN(launched)
FROM (SELECT country, COUNT(*) x , launched
FROM Ships b, Classes a
WHERE a.class = b.class
GROUP BY country, launched
) s
WHERE launched = ANY(SELECT MIN(launched)
FROM Ships bb, Classes aa
WHERE bb.class = aa.class
GROUP BY country, launched
)
GROUP BY country;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

The subquery in FROM clause determines quantity of rows for each unique pair {country, year of launch}. In the terms of the data domain this means that it is determined quantity of ships that had been launched by every single country in every year. Let the resultant set of the subquery “s” will be the following:

countryxlaunched
Gt.Britain61916
Gt.Britain11917
Japan11913
Japan21914
Japan21915
Japan11916
Japan11941
Japan11942
USA11920
USA11921
USA31941
USA21943
USA21944

Next, in the WHERE clause it will be selected rows with year of launch that matches any year from this subquery:

SELECT MIN(launched)
FROM Ships bb, Classes aa
WHERE bb.class = aa.class
GROUP BY country, launched;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

What this subquery gives us? It gives us all years from the table above, because the data is grouped by country and year. As result this subquery doesn’t take any effect to the data set, therefore it is unnecessary. It seemes that author want to express the minimal year condition in this expression. But the minimal year must be considered as minumum of those years in which maximum ships’ quantity for this country had been launched.

Removing the “launched” column from grouping isn’t enough to correct this mistake.

SELECT MIN(launched)
FROM Ships aa, Classes bb
WHERE bb.class = aa.class
GROUP BY country;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
as a result we will get minimal years by every country:

1916
1913
1920

Then rows that satisfies predicate would look like this:

countryxlaunched
Gt.Britain61916
Japan11913
Japan11916
USA11920

Thereby we have already lost correct rows for Japan and USA. We should take notice of this row for Japan:

Japan11916
SELECT country, MAX(x), MIN(launched)

GROUP BY country

we have grouping by country with determining of two aggregated factors — maximum of ships’ quatities and minimum of years. Using of grouping in this case leads to mistake. Indeed, rows that we need could already be found in table which in the FROM clause (for instance {Gt.Britain, 6, 1916}). We do not need any grouping here, we only need to filter out wrong rows. As a result of grouping only one row will be remained. It have statistic rates for whole group. In this case the maximum and the minimum gain in different rows of the group. This is clearly visible at USA’s ships table. In this one minimal year does not correspond maximum value (x=1). The maximum value (x=3) is reached at 1941th year. That’s why such grouping may return right result only if all “x” values for country are equal.

Everything in this solution is reversed. Nevertheless let’s try to correct it by finding out the reasons of mistakes and delusions.

Correlated subquery is needed for binding of year and country. It can be used in WHERE clause (AND aa.country = s.country):

WHERE launched = ANY(SELECT MIN(launched)
FROM Ships bb, Classes aa
WHERE aa.class = bb.class AND
aa.country = s.country
GROUP BY country
)

That is correct, but don’t take any effect for now, except maybe it excludes wrong row:

Japan11916

For advance we need to calculate minimal year among ones with maximum quantity of ships for each contry. Maximum quantity of ships is primary here. If we would filter only by minimal year we can lose right rows. That’s why we need to count quantity of ships, not a minimal year:

WHERE x >= ALL(SELECT count(launched)
FROM Ships bb, Classes aa
WHERE bb.class = aa.class AND
s.country=aa.country
GROUP BY country, launched
)

Take a note at predicate >= ALL which gives us maximal value of “x”. Let’s rewrite whole query with taking into account all we have said about grouping in the main query.

Solution 3.6.2

SELECT country, x, launched
FROM (SELECT country, COUNT(*) x , launched
FROM Ships b, Classes a
WHERE a.class = b.class
GROUP BY country, launched
) s
WHERE x >= ALL(SELECT COUNT(launched)
FROM Ships bb, Classes aa
WHERE bb.class = aa.class AND
s.country=aa.country
GROUP BY country, launched
);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Is this all? Not at all. If maximum value would be gained in several different years the row for every such year will occur in the result. But we need to output the row with minimal year by the data. As it was noted above, this is the certain case when grouping is valid: all “x” values for country are equal and maximal.

Solution 3.6.3. Using join instead of correlated subquery.

SELECT a.country, a.numShips AS Qty, MIN(launched) AS Year
FROM (SELECT country, COUNT(*) AS numShips, launched
FROM Classes INNER JOIN
Ships ON Classes.class = Ships.class
GROUP BY country, launched
) AS a INNER JOIN
(SELECT a.country, MAX(a.numShips) AS Qty
FROM (SELECT country, COUNT(*) AS numShips
FROM Classes INNER JOIN
Ships ON Classes.class = Ships.class
GROUP BY country, launched
) AS a
GROUP BY country) AS b
ON a.country = b.country AND a.numShips = b.Qty
GROUP BY a.country, a.numShips;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Inner join by country and quantity of ships from two subqueries is in the FROM clause. The first subquery determines country and quantity of ships that has been launched in each year for it. The second one contains similar query in the FROM clause but filters only pair {country, quantity of ships} that has maximum quantity of ships that has been launched for one year.

As a result of this join the tuple {country, maximum quantity of ships} is appended by year of launch of that quantity. Finally, data get grouped to determine minimal year if maximum occured more then one for single country similar to as it done in solution 3.6.2.

Solution 3.6.4.

Using HAVING clause.

SELECT country, QTY, MIN(launched)
FROM (SELECT country, launched, COUNT(name) QTY
FROM Classes c JOIN
Ships s ON c.class = s.class
GROUP BY country,launched
HAVING COUNT(name) = (SELECT MAX(qty)
FROM (SELECT country,launched,COUNT(name) qty
FROM Classes c1 JOIN
Ships s1 ON c1.class = s1.class
WHERE country = c.country
GROUP BY country,launched
)e
)
)T
GROUP BY t.qty, t.country;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

If the subquery of the FROM clause rows {country, launch year, quantity of ships} are determined first. Then HAVING clause filters only rows with quantity of ships that is equal to maximum quantity of ships for this country. Take note that subquery in this predicate is correlated.

WHERE country = c.country

That is why MAX(qty) relates to certain country from the main query and is not a global maximum. At last, it determines minimal year for each combination {country, maximum quantity of ships}.

Can we move to the next task? No we can’t. All of considered variants of solutions contain one type of mistake. We offer you to find it by your own.

If you can’t make it you are free to look at Q&A.

To solve a problem on SQL-EX.RU