Aggregate function to aggregate function

Let us consider this exercise:

Find the maximum value among the average prices of PCs, counted separately for each manufacturer.

Calculation of average cost for makers is not difficult:

SELECT AVG(price) avg_price  
FROM Product P JOIN PC ON P.model = PC.model   
GROUP BY maker;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

However, the standard prohibits the use of a subquery as an argument to aggregate function, i.e. we can not solve the exercise by this way:

SELECT MAX(  
SELECT AVG(price) avg_price  
FROM Product P JOIN PC ON P.model = PC.model   
GROUP BY maker  
);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

In such cases, use a subquery in the FROM clause:

SELECT MAX(avg_price)   
FROM (SELECT AVG(price) avg_price  
      FROM Product P JOIN PC ON P.model = PC.model   
      GROUP BY maker  
     ) X;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

By means of new features of language - window functions - this problem can be solved without a subquery:

SELECT DISTINCT MAX(AVG(price)) OVER () max_avg_price  
      FROM Product P JOIN PC ON P.model = PC.model   
     GROUP BY maker;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Note that window functions admit use of aggregate function as argument. DISTINCT keyword is necessary here because the maximal value, which has been counted up over all set of average values, will be “attributed” to each manufacturer.

We can’t use aggregate function as a parameter of another aggregate function. This implies impossibility of the solution to the above problem as follows.

SELECT MAX(AVG(price)) max_avg_price  
    FROM Product P JOIN PC ON P.model = PC.model   
    GROUP BY maker;

But there are no rules without exceptions. As it seems to be unreal, but this query is working in Oracle and returns the required result:

MAX_AVG_PRICE
850

You can assure yourself that it is true by visiting the learn-stage exercises page at sql-ex.ru, selecting Oracle in the DBMS list, and running the query (without checking solution for example).

Besides, the solution which uses window function will work in Oracle also. I can suggest that the solution without window function actually uses it implying the OVER() clause implicitly.

Sure, you’ll meet solving such exercises based on sorting with limit on the number of rows result set. However, such solutions are not legitimate from the standpoint of the standard language and, consequently, have different syntax in different implementations. As an example, I will give the solution of our exercise in the syntax of SQL Server and MySQL.

SQL Server

SELECT TOP 1 AVG(price) avg_price  
FROM Product P JOIN PC ON P.model = PC.model   
GROUP BY maker  
ORDER BY avg_price DESC;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

MySQL

SELECT AVG(price) avg_price  
FROM Product P JOIN PC ON P.model = PC.model  
GROUP BY maker  
ORDER BY avg_price DESC  
LIMIT 1;

Both of these solutions take only the first row of the sorted descending a set of average prices.

Do beginners learn SQL is often the problem the definition of maker, for whom is achieved the desired maximum / minimum. In other words, you want to find the maximum average price and the maker, the average price of a PC which is equal to the maximum average price.

Unconventional means to solve this exercise is actually discussed above query:

SELECT TOP 1 maker, AVG(price) avg_price  
FROM Product P JOIN PC ON P.model = PC.model   
GROUP BY maker  
ORDER BY avg_price DESC;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Using the maker in the column list of SELECT clause is quite acceptable, because by that column runs grouping. However, there is a “trap”. It stems from the fact that the maximum can be achieved for several makers and in the formulation of the exercise they need to show everyone, while we limit the sample only one (first) string. In this case dialect T-SQL has additional argument WITH TIES. The logically correct solution will look like:

SELECT TOP 1 WITH TIES maker, AVG(price) avg_price  
FROM Product P JOIN PC ON P.model = PC.model   
GROUP BY maker  
ORDER BY avg_price DESC;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

However, for portability of code the standardized solution is to be preferred.

Note

Implementation would require a minimum of efforts, if the test solution used for testing, would work on all alleged DBMS without changing its code. Therefore, compliance with standard may be one of requirements specification for the project.

We give below some standard solutions of the exercise.

  1. Use predicate ALL in  WHERE clause в предложении WHERE
SELECT maker, avg_price   
FROM (SELECT maker, AVG(price) avg_price  
      FROM Product P JOIN PC ON P.model=PC.model   
      GROUP BY maker  
     ) X  
WHERE avg_price >= ALL(SELECT AVG(price) avg_price  
                       FROM Product P JOIN PC ON P.model=PC.model   
                       GROUP BY maker  
                      );
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

In natural language, this query sound like follows: “Find makers, for which the average price of PCs no less than the average prices for each of the makers of PCs”.

2.  Use Inner joins

SELECT maker, avg_price   
FROM (SELECT maker, AVG(price) avg_price  
      FROM Product P JOIN PC on P.model=PC.model   
      GROUP BY maker  
      ) X JOIN   
      (SELECT MAX(avg_price) max_price   
       FROM (SELECT maker, AVG(price) avg_price  
             FROM Product P JOIN PC on P.model=PC.model   
             GROUP BY maker  
             ) X  
       ) Y ON avg_price = max_price;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Here we join a subquery that defines the makers and the average prices on their PC, with a subquery, which defines the maximum average price. The join is performed by the condition of equality of the average price of the first subquery with a maximum price of the second.

  1. Use predicate ALL in HAVING clause
SELECT maker, AVG(price) avg_price  
FROM Product P JOIN PC on P.model=PC.model   
GROUP BY maker  
HAVING AVG(price) >= ALL(SELECT AVG(price) avg_price   
                         FROM Product P JOIN PC ON P.model=PC.model   
                         GROUP BY maker  
                        );
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

This solution differs from the first version of the lack of “extra” request, which had to write only then to be able to use the alias avg_price in WHERE clause (see  clause order for SELECT); on the other hand, using aggregate function in WHERE clause are also prohibited by the rules of language.

All of the standard solutions seem heavy, though, and will work in almost all databases. This cumbersome due to repetition in the code is actually one and the same query. However, common table expressions – CTE, which were introduced in recent versions of the standard, allow repeatedly use a one formulated query. For example, solutions 1 and 3 using CTE can be written in the form:

WITH cte(maker, avg_price)  
AS (  
SELECT maker, AVG(price) avg_price  
FROM Product P JOIN PC on P.model=PC.model   
GROUP BY maker  
   )  
SELECT *  
FROM cte   
WHERE  avg_price>= ALL(SELECT avg_price   
                       FROM cte  
                       );
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Note that the maintenance of the common table expressions first appeared in SQL Server 2005 and PostgreSQL 8.4.