RANK() and DENSE_RANK() functions

RANK() and DENSE_RANK() functions

These functions also enumerate rows as ROW_NUMBER() function, but a somewhat different way. Difference is that the rows, that have the same values in column on which you are ordering, receive the same number (rank). For example, the values (ordered in ascending)

1
5
6
6
6
11
52
63
63
63

The question arises as from what numbers will continue numbering if, say, in a sequence of numbers will appear on 7 and so on? There are two variants:

  1. number 4, because it is the following number in sequence;

  2. number 6, because next line will be the sixth.

This “ambiguity” has led to the appearance of two functions instead of one - RANK and DENSE_RANK, the first of which will continue the numbering from 6 and the second (dense) - with 4.

Let’s consider some examples. We begin from showing the differences in the behavior of functions RANK and ROW_NUMBER:

SELECT *, ROW_NUMBER() OVER(ORDER BY type) num,
RANK() OVER(ORDER BY type) rnk
FROM Printer;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Here the values of compared functions are outputs in two last columns with row ordering in column ‘type’:

codemodelcolortypepricenumrnk
21433yJet27011
31434yJet29021
11276nLaser40033
61288nLaser40043
41401nMatrix15055
51408nMatrix27065

As expected ROW_NUMBER enumerates entire set of rows returned by the query. RANK function work on the second of the above variant, i.e. the line number will be the following number after rows with an identical rank.

Now compare the “dense” and “nondense” rank:

SELECT *,
RANK() OVER(ORDER BY type) rnk,
DENSE_RANK() OVER(ORDER BY type) rnk_dense
FROM Printer;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
codemodelcolortypepricernkrnk_dense
21433yJet27011
31434yJet29011
11276nLaser40032
61288nLaser40032
41401nMatrix15053
51408nMatrix27053

It should also draw attention to the order in which output rows of the result set. Because SELECT statement in our example does not have ORDER BY clause and same ordering by column ‘type’ is used to rank calculating, then the result is displayed in the same manner. In order to optimize if you don’t need any other ordering the result set; use this fact in order not to perform additional sorting, which decrease the performance of the query.

As well as for ROW_NUMBER function, PARTITION BY can be used in OVER clause, it divides the entire set of rows returned by the query to groups to which then the appropriate function is applied.

The query

SELECT *,
RANK() OVER(PARTITION BY type ORDER BY price) rnk
FROM Printer;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
allows the ranking of models by prices in ascending order in each group defined by printer type:

codemodelcolortypepricernk
21433yJet2701
31434yJet2902
11276nLaser4001
61288nLaser4001
41401nMatrix1501
51408nMatrix2702

And here so it is possible to select the cheapest models from each category:

SELECT model, color, type, price
FROM (
SELECT *,
RANK() OVER(PARTITION BY type ORDER BY price) rnk
FROM Printer
) Ranked_models
WHERE rnk =
1;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
modelcolortypeprice
1433yJet270
1276nLaser400
1288nLaser400
1401nMatrix150

The query could be shorter, if the RANK function could be used in a WHERE clause, since own value of the rank we do not need. However, it is forbidden (as for other ranking functions), at least in SQL Server.

Finally, consider another example.

Example. Find makers who produce more than 2 models of PC.

This task has a solution through the traditional aggregate functions

SELECT maker
FROM Product
WHERE type =
'PC'
GROUP BY maker
HAVING COUNT(*)
> 2;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

  However, this task can be solved by using the RANK function. The idea is as follows: to rank the models of each maker’s on a unique key and to select only those manufacturers, whose products reach the rank 3:

SELECT maker
FROM (
SELECT maker,
RANK() OVER(PARTITION BY maker ORDER BY model) rnk
FROM
Product
WHERE type =
'PC'
) Ranked_makers
WHERE rnk = 3;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Both in one, and in another case, of course, we get the same result:

maker
E

Once again: in last case, ordering must be performed on unique column combination, because, otherwise, it can be exists over 3 models but rank under 3 (for example 1, 2, 2, …). In our case, this condition is satisfied, because ordering is performed by the column ‘model’, which is the primary key in table Product.

By the way, the execution plans of these queries show the same cost of most costly operations - a table scan and sort (which in first case, there is an implicit, and called by the grouping operation).

Suggested exercises: 105, 116

Example of usage of DENSE_RANK

An often encountered problem is a finding of Nth value in magnitude among a set of values of some column of a table, for example:

Find out the second-largest value of the price in PC table.

Let’s deduce the sorted list of values of the price in PC table with additional columns of ranking functions for the control:

SELECT  price, DENSE_RANK() OVER(ORDER BY price DESC) dense_rnk,   
RANK() OVER(ORDER BY price DESC) rnk,  
ROW_NUMBER() OVER(ORDER BY price DESC) rn  
FROM PC ORDER BY price DESC;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
pricedense_rnkrnkrn
980,00111
970,00222
950,00333
850,00444
850,00445
850,00446
600,00577
600,00578
400,00699
350,0071010
350,0071011
350,0071012

In the frame of SQL-92 Standard, we can solve this task as follows:

SELECT MAX(price) "2nd_price"  FROM PC
WHERE price < (SELECT MAX(price) FROM PC);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
2nd_price
970,00

I.e. we find the maximum among the values which less than maximal value. But what shall we do, if it is needed to find the third-largest value of a  price? It is possible to act in the similar manner:

SELECT MAX(price) "3rd_price" FROM PC WHERE price <
(
SELECT MAX(price) FROM PC
WHERE price < (SELECT MAX(price) FROM PC)
);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
3rd_price
950,00

To find out N-largest price value, following the same logic, we could add new “Nested dolls” to already existing ones up to N-th. This decision in any way you will not name universal.

For solving such problems, use of DENSE_RANK window function is a good approach. For example, the initial task can be solved as follows:

SELECT DISTINCT price FROM(
SELECT DENSE_RANK() OVER(ORDER BY price DESC) rnk, price FROM PC
) X WHERE rnk=2;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

To deduce any other price in a sequence (third one, for example), it is enough to replace a constant in the search condition:

SELECT DISTINCT price FROM(
SELECT DENSE_RANK() OVER(ORDER BY price DESC) rnk, price FROM PC
) X WHERE rnk=3;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

It should be noted that use of  DENSE_RANK instead of RANK is caused by the case of duplicate price values, which force the RANK function to return the values with gaps (consider the problem of finding out the 5th-largest value). If the problem is stated as finding out the Nth value, which is not obligatory unique, we can use ROW_NUMBER function instead. But if the values are unique in a table, the solution with aid of any of these functions  gives the same result.