FIRST_VALUE and LAST_VALUE functions

For each company, display one flight selected randomly (airport database).

Using correlated subqueries

In the subquery, the data for each company is sorted randomly using the newid() function, and then one (the first) row of this sorted set is selected:

SELECT id_comp,
       (SELECT TOP 1 trip_no 
        FROM trip t 
        WHERE c.id_comp = t.id_comp 
        ORDER BY NEWID()
       ) trip_no
FROM company c
ORDER BY id_comp;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

id_comp	trip_no
1	1188
2	1146
3	1124
4	1101
5	7771

Of course, you will most likely get a different result, but since there is few data in the table, sooner or later you will be able to get this one. :-)

Using the FIRST_VALUE function

This window function returns the first value from an ordered set of values. Now we can do this without subqueries by selecting the set of flights for the company from the current row in the window using the PARTITION BY clause and ordering it randomly in the ORDER BY clause like in the previous example:

SELECT DISTINCT id_comp,
    FIRST_VALUE(trip_no) OVER(PARTITION BY id_comp ORDER BY NEWID()) trip_no
FROM trip
ORDER BY id_comp;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

id_comp	trip_no
1	1195
2	1145
3	1124
4	1100
5	8882

The DISTINCT keyword is needed here to avoid repeating the same company for every flight it’s done.

Ambush with LAST_VALUE

It would seem what difference does it make to take the first or last value from a randomly ordered set? But let’s see what we get if to replace FIRST_VALUE with LAST_VALUE in the previous query:

SELECT DISTINCT id_comp,
    LAST_VALUE(trip_no) OVER(PARTITION BY id_comp ORDER BY NEWID()) trip_no
FROM trip
ORDER BY id_comp;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

I will only show the results for id_comp = 1. You can run the query yourself to make sure that absolutely all flights from the Trip table will be displayed.

id_comp	trip_no
1	1181
1	1182
1	1187
1	1188
1	1195
1	1196

What do we do in such cases? Of course, we turn to the documentation, and there we read… No, wait, first the full syntax:

LAST_VALUE | FIRST_VALUE ( [scalar_expression] )  [IGNORE NULLS | RESPECT NULLS]
  OVER ( [partition_by_clause] order_by_clause [rows_range_clause] )

Here IGNORE NULLS or RESPECT NULLS determine whether NULL values will be taken into account; rows_range_clause specifies the window options.

Now let’s read:

Important

The default range is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

That is, the window is the range from the current row and infinitely higher. Since we select the last row of the range, the current row will always be displayed, no matter how the rows are sorted. That is, no matter how many rows will be above during random sorting.

That’s why DISTINCT doesn’t help, because all output rows turn out to be unique.

So we just need to explicitly (and correctly!) set the window parameters, namely, from the current row and indefinitely below, since we choose the last value:

SELECT DISTINCT id_comp,
    LAST_VALUE(trip_no) OVER(PARTITION BY id_comp ORDER BY NEWID()
                    RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) trip_no
FROM trip
ORDER BY id_comp;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

SELECT DISTINCT id_comp,
    LAST_VALUE(trip_no) OVER(PARTITION BY id_comp ORDER BY NEWID()
                    RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) trip_no
FROM trip
ORDER BY id_comp;

🚫

[[ error ]]

[[ column ]]
NULL [[ value ]]

id_comp	trip_no
1	1188
2	1145
3	1123
4	1101
5	7773

The last question is remain. If we did not set the window options, why did the query with FIRST_VALUE work correctly?

The answer is obvious - because the default value worked for us here, although I didn’t find in the documentation what it should be for FIRST_VALUE .

I can assume that it is the same as for LAST_VALUE.

#Window Functions #NEWID #FIRST_VALUE #LAST_VALUE #UNBOUNDED

LAG and LEAD functions COUNT DISTINCT and window functions