Correlated subqueries

Correlated subqueries sometimes let to create the query very briefly, which can look more cumbersome when you use other means. We shall remind that correlated subquery contains reference to the query which it contains (we shall name it the main query) with the result that subquery executes for each string of the main query.

Example 8.9.1

It is required to determine the date and the flights of every passenger, committed by them at its last flight day

To say the other way, you have to find the maximum dateof the flight for each passenger and to find all of his flights. With determination of the maximum date there is no problems :

SELECT id_psg, MAX(date)
FROM pass_in_trip
GROUP BY id_psg;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

However, there is no flight. If we’ll try to include flight in the list of output:

SELECT id_psg, trip_no, MAX(date)
FROM pass_in_trip
GROUP BY id_psg;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
we receive an error message, because number of flight is not used in aggregate function and also does not enter into the list of columns of the grouping. If we’ll do the last one:

SELECT id_psg, trip_no, MAX(date)
FROM pass_in_trip
GROUP BY id_psg, trip_no;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
we receive last flight of the passenger by each flight which he flied. It is completely not that problem which we try to solve. Use of a сorrelated subquery

SELECT id_psg, trip_no, [date]
FROM pass_in_trip pt_1
WHERE [date] = (SELECT MAX([date])
FROM pass_in_trip pt_2
WHERE pt_1.id_psg = pt_2.id_psg
);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
gives that is necessary for us:

id_psgtrip_nodate
1011872003-04-14 00:00:00.000
911822003-04-13 00:00:00.000
811872003-04-14 00:00:00.000
611232003-04-08 00:00:00.000
511452003-04-25 00:00:00.000
311452003-04-05 00:00:00.000
311232003-04-05 00:00:00.000
211242003-04-02 00:00:00.000
111002003-04-29 00:00:00.000

Here for each flight it is checked, whether it is accomplished last flight day of the given passenger. Thus if such flights there was a several, we receive all of them.

Seeming lack of the resulted solution is just also that the subquery should be calculated for every string of main query. To avoid it, it is possible to offer the alternative solution, which is using joining of the Pass_in_trip table with a subquery resulted at the beginning which calculates the maximal dates for each passenger:

SELECT pt_1.id_psg, trip_no, [date]
FROM pass_in_trip pt_1 JOIN
(SELECT id_psg, MAX([date]) md
FROM pass_in_trip
GROUP BY id_psg
) pt_2 ON pt_1.id_psg = pt_2.id_psg AND
[date] = md;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

However, as it will seem strangely, optimizer of the SQL Server 2000 will choose for both queries the identical  execution plan at presence of a natural index (cluster) on a primary key. Accordingly, and time of execution of queries should not differ.

Important

It should be reminded that examples given here can be run directly on the website by selecting the check box “Without checking” on the page with exercises.

Suggested exercises: 58