Numbering rows when there are duplicates in the results column

According to the relational theory a table can't have identical rows. And though realizations allow creating tables without a primary key and as a result allow identical rows this sort of situation shows a mistake in planning. Also, a table without a primary key or unique index is not renewable. The last conclusion is quite reasonable because the system has no information as to which of the duplicates to choose.

So, speaking of duplicates, we mean duplicates in the resulting set whose appearance can be caused by the complete or partial absence (in the case of a compound key) of the primary key in the resulting set.

To illustrate, let's examine the following query

  1. SELECT id_psg
  2. FROM pass_in_trip;
which will return the numbers of passengers who took a flight. As one passenger can take several flights we get duplicates here. But this passenger can't take the same flight on the same day more than once and that is regulated by the appropriate primary key - {trip_no, date, id_psg}.

So we need to enumerate the passengers who may duplicate. Let's first define the order in which it should be done. Let this order be based on three columns flight date, passenger identifier and the flight number (in ascending order).

To reduce this task to the one we've discussed previously (this is possible because three mentioned columns form the primary key) let's construct a column that would unite information from the listed columns. As the columns have different data types we'll convert them to a symbolical representation and perform concatenation.

For this we need to define the number of characters. There is no time mentioned in the flight date so 11 symbols are enough. The flight number is a 4-character number everywhere. Only the passenger identifier remains. Based on the database we have, we'll use 2 symbols -- this does not in any way detriment from the general approach. However, for the sorting to be performed correctly, we need to add a zero before the actual number for the single-digit passengers, like 01, 09, etc. Otherwise, passenger number 2 will be placed after passenger number 10, for example. Let's perform the appropriate conversions.

Transformations 8.2.2

  1. Date: CAST(date AS CHAR(11))
  2. Trip: CAST(trip_no AS CHAR(4))
  3. Passenger ID: RIGHT('00'+CAST(id_psg AS VARCHAR(2)),2).

In the last conversion (2) I used the non-standard RIGHT function (SQL Server), which substracts from a string a defined number of symbols starting from the right. Of course the SUBSTRING function could be used but this one is shorter and, besides, other commercial products should have analogous improvements on the standard. Combining these expressions in the indicated order we get a unique column which will be used to numerate the passengers according to increment (decrement) of values in this column. Using this column we can utilize solution 8.2.1. Our final result will be

  1. SELECT COUNT(*) num, P2.id_psg
  2. FROM (SELECT *, CAST(date AS CHAR(11)) +
  3. RIGHT('00' + CAST(id_psg AS VARCHAR(2)), 2)+
  4. CAST(trip_no AS CHAR(4)) dit
  5. FROM Pass_in_trip
  6. ) P1 JOIN
  7. (SELECT *, CAST(date AS CHAR(11)) +
  8. RIGHT('00' + CAST(id_psg AS VARCHAR(2)), 2)+
  9. CAST(trip_no AS CHAR(4)) dit
  10. FROM pass_in_trip
  11. ) P2 ON P1.dit <= P2.dit
  12. GROUP BY P2.dit, P2.id_psg
  13. ORDER BY 1;

For numbering in a different order just concatenate the transformations 8.2.2 in a different sequence. For example, to number passengers in the sequence of their identification numbers the first item should be

  1. RIGHT('00' + CAST(id_psg AS VARCHAR(2)), 2).

In this example even more evidently, than that for the solution 8.2.1, the advantages of ranking functions are demonstrated. It is hard to imagine more simple solution than that with using ROW_NUMBER function:

  1. SELECT ROW_NUMBER() OVER(ORDER BY date, id_psg, trip_no) num, id_psg
  2. FROM Pass_in_trip
  3. ORDER BY num;

Suggested exercises: 147

Bookmark and Share
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONSTRAINT CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema DATEADD DATEDIFF DATENAME DATEPART DATETIME date_time functions DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates edge equi-join EXCEPT exercise (-2) More tags
The book was updated
month ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.