loading..
Русский    English
22:16

RANK() and DENSE_RANK() functions page 1

These functions also enumerate rows as ROW_NUMBER() function, but a somewhat different way. Difference is that the rows, that have the same values in column on which you are ordering, receive the same number (rank). For example, the values (ordered in ascending)

1
5
6
6
6
will have such numbers:

1 1
5 2
6 3
6 3
6 3

The question arises as from what numbers will continue numbering if, say, in a sequence of numbers will appear on 7 and so on? There are two variants:

1) number 4, because it is the following number in sequence;

2) number 6, because next line will be the sixth.

This "ambiguity" has led to the appearance of two functions instead of one - RANK and DENSE_RANK, the first of which will continue the numbering from 6 and the second (dense) - with 4.

Let's consider some examples. We begin from showing the differences in the behavior of functions RANK and ROW_NUMBER:

Console
Execute
  1. SELECT *, ROW_NUMBER() OVER(ORDER BY type) num,
  2. RANK() OVER(ORDER BY type) rnk
  3. FROM Printer;

Here the values of compared functions are outputs in two last columns with row ordering in column ‘type’:

code model color type price num rnk
2 1433 y Jet 270.00 1 1
3 1434 y Jet 290.00 2 1
1 1276 n Laser 400.00 3 3
6 1288 n Laser 400.00 4 3
4 1401 n Matrix 150.00 5 5
5 1408 n Matrix 270.00 6 5

As expected ROW_NUMBER enumerates entire set of rows returned by the query. RANK function work on the second of the above variant, i.e. the line number will be the following number after rows with an identical rank. 

Now compare the "dense" and "nondense" rank:

Console
Execute
  1. SELECT *, RANK() OVER(ORDER BY type) rnk,
  2. DENSE_RANK() OVER(ORDER BY type) rnk_dense
  3. FROM Printer;

code model color type price rnk rnk_dense
2 1433 y Jet 270.00 1 1
3 1434 y Jet 290.00 1 1
1 1276 n Laser 400.00 3 2
6 1288 n Laser 400.00 3 2
4 1401 n Matrix 150.00 5 3
5 1408 n Matrix 270.00 5 3

It should also draw attention to the order in which output rows of the result set. Because SELECT statement in our example does not have ORDER BY clause and same ordering by column ‘type’ is used to rank calculating, then the result is displayed in the same manner. In order to optimize if you don’t need any other ordering the result set; use this fact in order not to perform additional sorting, which decrease the performance of the query.

As well as for ROW_NUMBER function, PARTITION BY can be used in OVER clause, it divides the entire set of rows returned by the query to groups to which then the appropriate function is applied.

The query

Console
Execute
  1. SELECT *, RANK() OVER(PARTITION BY type ORDER BY price) rnk
  2.  
  3. FROM Printer;
allows the ranking of models by prices in ascending order in each group defined by printer type:

code model color type price rnk
2 1433 y Jet 270.00 1
3 1434 y Jet 290.00 2
1 1276 n Laser 400.00 1
6 1288 n Laser 400.00 1
4 1401 n Matrix 150.00 1
5 1408 n Matrix 270.00 2

And here so it is possible to select the cheapest models from each category:

Console
Execute
  1. SELECT model, color, type, price
  2. FROM (
  3. SELECT *, RANK() OVER(PARTITION BY type ORDER BY price) rnk
  4. FROM Printer
  5. ) Ranked_models
  6. WHERE rnk = 1;

model color type price
1433 y Jet 270.00
1276 n Laser 400.00
1288 n Laser 400.00
1401 n Matrix 150.00

The query could be shorter, if the RANK function could be used in a WHERE clause, since own value of the rank we do not need. However, it is forbidden (as for other ranking functions), at least in SQL Server.

Finally, consider another example.

Example. Find makers who produce more than 2 models of PC.

This task has a solution through the traditional aggregate functions

Console
Execute
  1. SELECT maker FROM Product
  2. WHERE type = 'PC'
  3. GROUP BY maker
  4. HAVING COUNT(*) > 2;

  However, this task can be solved by using the RANK function. The idea is as follows: to rank the models of each maker's on a unique key and to select only those manufacturers, whose products reach the rank 3:

Console
Execute
  1. SELECT maker
  2. FROM (
  3. SELECT maker, RANK() OVER(PARTITION BY maker ORDER BY model) rnk
  4. FROM Product
  5. WHERE type = 'PC'
  6. ) Ranked_makers
  7. WHERE rnk = 3;

Both in one, and in another case, of course, we get the same result:

maker
E

Once again: in last case, ordering must be performed on unique column combination, because, otherwise, it can be exists over 3 models but rank under 3 (for example 1, 2, 2, …). In our case, this condition is satisfied, because ordering is performed by the column ‘model’, which is the primary key in table Product.

By the way, the execution plans of these queries show the same cost of most costly operations - a table scan and sort (which in first case, there is an implicit, and called by the grouping operation).

Suggested exercises: 105, 116

Bookmark and Share
Pages 1 2
Tags
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONSTRAINT CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema DATEADD DATEDIFF DATENAME DATEPART DATETIME date_time functions DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates edge equi-join EXCEPT exercise (-2) More tags
The book was updated
month ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.