Common table expressions (CTE)

Common table expressions (CTE)

Let’s start with an example to clarify the appointment of common table expressions.

Find the maximum amount of income/outcome among all 4 tables in the database “Recycled materials company”, as well as the type of operation, date and point of reception, when and where it was recorded.

The task can be solved, the following way:

SELECT inc as max_sum, type, date, point
FROM (
  
  SELECT inc, 'inc' type, date, point FROM Income
  
  UNION ALL  
  SELECT inc, 'inc' type, date, point FROM Income_o
  
  UNION ALL
  
  SELECT out, 'out' type, date, point FROM Outcome_o
  
  UNION ALL
  
  SELECT out, 'out' type, date, point FROM Outcome
) X
  
    WHERE inc >= ALL(
SELECT inc FROM Income
  
       UNION ALL
SELECT inc FROM Income_o
  
       UNION ALL
SELECT out FROM Outcome_o
  
       UNION ALL
SELECT out FROM Outcome
);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

First, we combine all available information, and then select only those rows for which the sum not less than each of the sums of the same query of 4 tables. In fact, we have twice written the code of union of four tables. How do I avoid this? You can create a view, and then address a query to him:

CREATE VIEW Inc_Out
AS
  
  SELECT inc, 'inc' type, date, point FROM Income
  
  UNION ALL
  
  SELECT inc, 'inc' type, date, point FROM Income_o
  
  UNION ALL
SELECT out, 'out' type, date, point   
     FROM Outcome_o
  
  UNION ALL
SELECT out, 'out' type,date, point FROM Outcome;
  
GO
  
SELECT inc as max_sum, type, date, point
  
FROM Inc_Out
WHERE inc >= ALL(
SELECT inc
FROM Inc_Out);

So, CTE plays the role of view, which is created within a single query and not stored as an object of the schema. The previous solution can be rewritten with the CTE as follows:

 

WITH Inc_Out
AS (
  
  SELECT inc, 'inc' type, date, point FROM Income
  
  UNION ALL
SELECT inc, 'inc' type, date, point FROM Income_o
  
  UNION ALL
SELECT out, 'out' type, date, point FROM Outcome_o
  
  UNION ALL
SELECT out, 'out' type,date, point FROM Outcome
)
  
SELECT inc as max_sum, type, date, point
FROM Inc_Out
  
WHERE inc >= ALL(
SELECT inc
FROM Inc_Out);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

As you can see, all similar to view except for the using of obligatory parentheses to restriction of query, formally, we need only replace CREATE VIEW with WITH. As for view, the column list can be indicated in parentheses after the name of CTE, if we need to include them not all from query and/or rename.

For example, (I add additional minimal sum to previous query)

 

WITH Inc_Out(m_sum, type, date, point)
AS (
  
  SELECT inc, 'inc' type, date, point FROM Income
  
  UNION ALL
SELECT inc, 'inc' type, date, point FROM Income_o
  
  UNION ALL
SELECT out, 'out' type, date, point FROM Outcome_o
  
  UNION ALL
SELECT out, 'out' type,date, point FROM Outcome
)
  
SELECT 'max' min_max,* FROM Inc_Out
WHERE m_sum >= ALL(
  
  SELECT m_sum FROM Inc_Out)
  
  UNION ALL
SELECT 'min', * FROM Inc_Out
WHERE m_sum <= ALL(  
         SELECT m_sum FROM Inc_Out);
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Common table expressions can significantly reduce the amount of code, if repeatedly had to turn to the same derived tables. Note that the CTE can be used not only with the operator SELECT, but also with other operators of the DML language. Let’s solve the following task: Need to send the passengers of flight 7772 on November 11, 2005 by another closest flight that departing later in the same day in the same destination. Ie this task is to update the records in the table Pass_in_trip. I shall not provide a solution to this task, which does not use the CTE, but you can do it for themselves, to compare the amount of code of two solutions.

Suggested exercises: 24, 116

I propose step by step build a solution and present the results in the form of select queries, so you can control the results, executing these queries online. Since the operators of data modification so far are prohibited on the site, I will give a final decision at end.

Let’s start with the table you will need to update:

;WITH
Trip_for_replace AS(
  
  SELECT * FROM Pass_in_trip WHERE trip_no=7772 AND date='20051129'
)
  
    SELECT * FROM Trip_for_replace;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Since the CTE play the role of views, they can in principle be used to update the data. The words «in principle» mean that the CTE is updated, if certain conditions are met, similar to the conditions of view updates. In particular, the definition should be used only one table without grouping and calculated columns. Note that the necessary conditions in this case are satisfied.

Now we need information on flight 7772 in order to find the closest to it a suitable flight. Let’s add one more CTE in the definition:

;WITH
Trip_for_replace AS(
  
  SELECT * FROM Pass_in_trip WHERE trip_no=7772 AND date='20051129'
),
  
  Trip_7772 AS (
SELECT * FROM Trip WHERE trip_no=7772
)
  
  SELECT * FROM Trip_7772;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Note that in one query can define any number of common table expressions. And most importantly, CTE may include links to other CTE, than we actually now use (note the link Trip_7772 in definition of Trip_candidates)

;WITH
Trip_for_replace AS(
  
  SELECT * FROM Pass_in_trip   
  WHERE trip_no=7772 AND date='20051129'
),
Trip_7772 AS (
  
        SELECT * FROM Trip WHERE trip_no=7772
),
  
  Trip_candidates as(
  
  SELECT Trip.* FROM Trip, Trip_7772
  
  WHERE Trip.town_from+Trip.town_to = Trip_7772.town_from +   
    Trip_7772.town_to
and Trip.time_out > Trip_7772.time_out
)
  
SELECT * FROM Trip_candidates;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

Trip_candidates - a table expression, which determines candidates for replacement, namely, flights, departing later than 7772, and are committed between the same cities. I use string concatenation town_from + town_to, not to write separate criteria for the point of departure and destination.

Now find among the candidates rows closest to the time the flight:

;WITH
Trip_for_replace AS(
  
  SELECT * FROM Pass_in_trip WHERE trip_no=7772 AND date='20051129'
),
  
  Trip_7772 AS (
SELECT * FROM Trip WHERE trip_no=7772
),
  
  Trip_candidates AS(
SELECT Trip.* FROM Trip, Trip_7772
  
   WHERE Trip.town_from+Trip.town_to = Trip_7772.town_from +   
       Trip_7772.town_to
and Trip.time_out > Trip_7772.time_out
),
  
  Trip_replace AS(
SELECT * FROM Trip_candidates
  
           WHERE time_out <= ALL(SELECT time_out FROM Trip_candidates)
)
  
  SELECT * FROM Trip_replace;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

 Now we can replace SELECT statement by UPDATE, to solve the task:

WITH
Trip_for_replace AS(
  
  SELECT * FROM Pass_in_trip WHERE trip_no=7772 AND date='20051129'
),
  
  Trip_7772 AS (
SELECT * FROM Trip WHERE trip_no=7772
),
  
  Trip_candidates AS(
SELECT Trip.* FROM Trip, Trip_7772
  
    WHERE Trip.town_from+Trip.town_to = Trip_7772.town_from +   
         Trip_7772.town_to
and Trip.time_out > Trip_7772.time_out
),
  
    Trip_replace AS(
SELECT * FROM Trip_candidates
  
           WHERE time_out <= ALL(SELECT time_out FROM Trip_candidates)
)
  
  UPDATE Trip_for_replace SET trip_no = (SELECT trip_no FROM Trip_replace);

Here we assume that between the two cities are no flights that would go in one and the same time in one direction. Otherwise, it would need an additional criterion for the selection of a single flight, as our goal - update data, not presentation of all possible candidates for replacement.

Using the CTE with a DELETE, you can learn by the example of removing duplicate rows from a table.

The query we have used in SQL Server -

WITH CTE AS (
    SELECT name, ROW_NUMBER() OVER(PARTITION BY name ORDER BY name) rnk
    FROM T
     )
DELETE FROM CTE
WHERE rnk > 1;

will fail with error:

Relation "cte" does not exist.

This error means that we cannot delete rows from common table expressions (CTE), but only from base tables.
Nevertheless, it is possible to delete duplicates with a single query using CTE.

To do it, let’s try the following algorithm:

  1. Delete all rows from the base table, returning them into common table expression (first CTE).
  2. Using the result obtained in the 1st step, form unique rows which we need to retain in the table (second CTE).
  3. Insert the rows from the step 2 into the base table.

Let’s use the table from cited example for writing the query:

create table T (name varchar(10));
insert into T values
('John'),
('Smith'),
('John'),
('Smith'),
('Smith'),
('Tom');

Here the query is

with t_deleted as
(delete from T returning *), -- 1st step
t_inserted as
(SELECT name, ROW_NUMBER() OVER(PARTITION BY name ORDER BY name) rnk
    FROM t_deleted) -- 2nd step
insert into T select name from t_inserted
where rnk=1; -- 3rd step
--(here we move the search condition from the step 2 to reduce the code)

When we run the query

select * from T;

the desired result will be obtained:

name
John
Smith
Tom