Numbering rows when there are duplicates in the results column

According to the relational theory a table can’t have identical rows. And though realizations allow creating tables without a primary key and as a result allow identical rows this sort of situation shows a mistake in planning. Also, a table without a primary key or unique index is not renewable. The last conclusion is quite reasonable because the system has no information as to which of the duplicates to choose.

So, speaking of duplicates, we mean duplicates in the resulting set whose appearance can be caused by the complete or partial absence (in the case of a compound key) of the primary key in the resulting set.

To illustrate, let’s examine the following query

SELECT id_psg
FROM pass_in_trip;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

which will return the numbers of passengers who took a flight. As one passenger can take several flights we get duplicates here. But this passenger can’t take the same flight on the same day more than once and that is regulated by the appropriate primary key - {trip_no, date, id_psg}.

So we need to enumerate the passengers who may duplicate. Let’s first define the order in which it should be done. Let this order be based on three columns – flight date, passenger identifier and the flight number (in ascending order).

To reduce this task to the one we’ve discussed previously (this is possible because three mentioned columns form the primary key) let’s construct a column that would unite information from the listed columns. As the columns have different data types we’ll convert them to a symbolical representation and perform concatenation.

For this we need to define the number of characters. There is no time mentioned in the flight date so 11 symbols are enough. The flight number is a 4-character number everywhere. Only the passenger identifier remains. Based on the database we have, we’ll use 2 symbols – this does not in any way detriment from the general approach. However, for the sorting to be performed correctly, we need to add a zero before the actual number for the single-digit passengers, like 01, 09, etc. Otherwise, passenger number 2 will be placed after passenger number 10, for example. Let’s perform the appropriate conversions.

Transformations 8.2.2

Date: CAST(date AS CHAR(11))
Trip: CAST(trip_no AS CHAR(4))
Passenger ID: RIGHT('00'+CAST(id_psg AS VARCHAR(2)),2).

In the last conversion (2) I used the non-standard RIGHT function (SQL Server), which substracts from a string a defined number of symbols starting from the right. Of course the SUBSTRING function could be used but this one is shorter and, besides, other commercial products should have analogous “improvements” on the standard. Combining these expressions in the indicated order we get a unique column which will be used to numerate the passengers according to increment (decrement) of values in this column. Using this column we can utilize solution 8.2.1. Our final result will be

SELECT COUNT(*) num, P2.id_psg
FROM (SELECT *, CAST(date AS CHAR(11)) +
                RIGHT('00' + CAST(id_psg AS VARCHAR(2)), 2)+
                CAST(trip_no AS CHAR(4)) dit
       FROM Pass_in_trip
      ) P1 JOIN
      (SELECT *, CAST(date AS CHAR(11)) +
                 RIGHT('00' + CAST(id_psg AS VARCHAR(2)), 2)+
                 CAST(trip_no AS CHAR(4)) dit
       FROM pass_in_trip
       ) P2 ON P1.dit <= P2.dit
GROUP BY P2.dit, P2.id_psg
ORDER BY 1;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

For numbering in a different order just concatenate the transformations 8.2.2 in a different sequence. For example, to number passengers in the sequence of their identification numbers the first item should be

RIGHT('00' + CAST(id_psg AS VARCHAR(2)), 2).

In this example even more evidently, than that for the solution 8.2.1, the advantages of ranking functions are demonstrated. It is hard to imagine more simple solution than that with using ROW_NUMBER function:

SELECT ROW_NUMBER() OVER(order by date, id_psg, trip_no) num, id_psg
FROM Pass_in_trip
ORDER BY num;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

Suggested exercises: 147

#Ranking Functions #ROW_NUMBER #Numbering

Numbering of rows in accordance with the order of values of a primary key