Recursive CTE

A CTE has one more important. With it, you can write a recursive query, i.e. request, which, once written, will be repeated many times until some condition is true. A recursive CTE is as follows:

WITH < name >[(< column list >)]
AS(
< SELECT... > -- Anchor part.
UNION ALL -- Recursive part.
< SELECT...FROM < name > >
WHERE < condition for the continuation of iterations >
)

A recursive CTE-query differs from ordinary CTE in recursive part only that introduced by UNION ALL statement. Note that a there is reference to CTE name in recursive part, i.e., CTE refers to itself within itself. This is, in fact, there is recursion. Naturally, anchor and recursive parts must have identical column set.

We turn to the example. Consider the task of getting the alphabet, i.e. table of alphabetic characters - uppercase Latin letters. To be a base for comparison, we solve this problem, first by generating a numerical sequence, which was seen in section 8.1.

SELECT CHAR(ASCII('A')+5*(a-1) + b-1) AS num
FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3
 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
 ) x CROSS JOIN
 (SELECT 1 b UNION ALL SELECT 2 UNION ALL SELECT 3
 UNION ALL SELECT 4 UNION ALL SELECT 5
 ) y
WHERE 5*(a-1) + b <= 26
ORDER BY 1;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

The decision via the recursive CTE:

WITH Letters AS(
SELECT ASCII('A') code, CHAR(ASCII('A')) letter
UNION ALL
SELECT code+1, CHAR(code+1) FROM Letters
WHERE code+1 <= ASCII('Z')
)
SELECT letter FROM Letters;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

In anchor part of query we define ASCII-code of the first letters of the alphabet and the corresponding symbol. In recursive part of query we simply increase the ASCII-code by one, referring to the CTE in the FROM clause. As a result, the lines with next letters will be added sequentially (UNION ALL) after line with first letter, in the order of their ASCII-code. Iteration will continue as long as the condition code +1 <= ascii (‘Z’) is true, i.e. until a letter Z will not be added.

The statement

SELECT letter FROM Letters
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
actually intended to refer to the CTE, start recursion and output results. Everything else can be regarded as a definition. Note that by default is allowed 100 iterations. This means that if the termination condition is not fulfilled earlier iterations, then the recursion is stopped after 100 iterations. Maximum number of iterations can be changed using “hint”

OPTION(MAXRECURSION N)

where N - the maximum number of iterations. A value of 0 does not limit the number of iterations. We must carefully use this value as is fraught with cycling. If the query was not completed within a specified number of iterations, an error (the rows received to this time are returned):

The statement terminated. The maximum recursion N has been exhausted before statement completion.

Let’s solve another challenge as the answer to the question that I have held numerous meetings on the Internet landscape.

Convert text in a table column in such a way that each word begins with an uppercase letter.

Here’s an example of data and the desired result:

namerep
deltaDelta
KSI PSIKsi Psi
alfa beta gamma    zetaAlfa Beta Gamma    Zeta
ALfa and omegAAlfa And Omega

For a few exceptions (among which may be mentioned abbreviations and initials) can be assumed that the word inside the text is preceded by a space. This can be used as a search criterion we need the elements of the text. I propose to implement a following trivial algorithm:

  1. The first letter of the text makes a capital, and the rest - lower.

  2. Then each “space + letter” combination translates to upper case.

From the first paragraph of the algorithm’s simple:

SELECT name, UPPER(LEFT(name, 1)) +
LOWER(SUBSTRING(name, 2, LEN(name) - 1)) rep
FROM
(SELECT 'ALfa and omegA' AS name
UNION ALL SELECT 'alfa beta gamma    zeta'
UNION ALL SELECT 'KSI PSI'
UNION ALL SELECT 'delta'
) X;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]
namerep
ALfa and omegAAlfa and omega
alfa beta gamma    zetaAlfa beta gamma    zeta
KSI PSIKsi psi
deltaDelta

To implement the second paragraph, there are some ways. Since the Latin alphabet is not so much (26), we can just make 26 substitutions. I was not lazy and will bring the full query so you can experiment.

So,

SELECT name, REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(rep, ' a', ' A'), ' b', ' B'), ' c', ' C'), ' d', ' D'),
' e', ' E'), ' f', ' F'), ' g', ' G'), ' h', ' H'), ' i', ' I'), ' j', ' J'), ' k', ' K'),
' l', ' L'), ' m', ' M'), ' n',' N'), ' o', ' O'), ' p', ' P'), ' q', ' Q'), ' r', ' R'),
' s', ' S'), ' t', ' T'), ' u', ' U'), ' v', ' V'), ' w', ' W'),
' x', ' X'), ' y', ' Y'), ' z', ' Z')
FROM(
SELECT name, UPPER(LEFT(name,1)) + LOWER(SUBSTRING(name, 2, LEN(name)-1)) rep
FROM
(SELECT 'ALfa and omegA' AS name
UNION ALL SELECT 'alfa beta gamma    zeta'
UNION ALL SELECT 'KSI PSI'
UNION ALL SELECT 'delta'
) X
) Y;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

It is easy to guess that the next query will use a recursive CTE.

First, write the two prime CTE, which form our test case and determine the ASCII-code of the first letters of the alphabet (A) (constant - not our way :-)).

Then followed anchor part, which performs the previously described operation of bringing the whole text to lowercase with uppercase first letter. Here, the replacement of character with the code code and preceded by blank space on same character. Do not be confused by such a seemingly useless replacement. The fact is that for case-insensitive databases characters ‘a’ and ‘A’ did not differ. Let’s stop and see the result.

WITH NM(name) AS
(SELECT 'ALfa and omegA' AS name
UNION ALL SELECT 'alfa beta gamma    zeta'
UNION ALL SELECT 'KSI PSI'
UNION ALL SELECT 'delta'
),
Ascii_code AS(
SELECT  ASCII('A') AS code
),
Repl(name, code, rep) AS
(SELECT name, code, REPLACE(UPPER(LEFT(name, 1)) +
LOWER(SUBSTRING(name, 2, LEN(name) - 1)),' '+CHAR(code),' '+CHAR(code)) rep
FROM Ascii_code, NM
)
SELECT name, rep FROM Repl;
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

 | name | rep | |————————-|————————-| | ALfa and omegA | Alfa And omega | | alfa beta gamma    zeta | Alfa beta gamma    zeta | | KSI PSI | Ksi psi | | delta | Delta |

We add, finally, the recursive part, which we will replace the chars with the code code +1. Recursion will continue as long as there is no violation of the condition code <ASCII (‘Z’), i.e. until we will go all the characters. What do we get in output? The same rows will be added (UNION ALL) with the replacement of the next char, to rows, which were obtained as a result of anchor part, on each iteration. Note large amount of results using this method; in our case 4х26 = 104 rows. From this set of rows we are interested in only those that result from the last iteration, i.e. when they were made all substitutions. This latest iteration corresponds to the condition code = ASCII (‘Z’), which is used in the final query:

WITH NM(name) AS
(SELECT 'ALfa and omegA' AS name
UNION ALL SELECT 'alfa beta gamma    zeta'
UNION ALL SELECT 'KSI PSI'
UNION ALL SELECT 'delta'
),
Ascii_code AS(
SELECT  ASCII('A') AS code
),
Repl(name, code, rep) AS
(SELECT name, code, REPLACE(UPPER(LEFT(name, 1)) +
LOWER(SUBSTRING(name, 2, LEN(name) - 1)),' '+CHAR(code),' '+CHAR(code)) rep
FROM Ascii_code, NM
UNION ALL
SELECT name, code+1 code,
REPLACE(rep,' ' + CHAR(code+1), ' ' + char(code + 1)) rep FROM Repl
WHERE code < ASCII('Z')
)
SELECT name, rep FROM Repl WHERE code=ASCII('Z');
mssql
🚫
[[ error ]]
[[ column ]]
[[ value ]]

I would like to warn you against excessive using of recursive CTE, as they often lose in productivity “traditional” methods. I’m not going to go far for examples and compare the method presented here. By increasing the number of processed rows to 10000 I got a following CPU utilization time:

the method based on REPLACE: 842 ms

recursive method: 6615 ms

Certainly there are tasks that can not be solved nonprocedural within the framework SQL-92 standard. In these cases, the use of recursive CTE is rightly so. In other cases, I would recommend to carry out performance tests to alternative solutions.

By the way, PostgreSQL and Oracle have a built-in function - INITCAP - which solves the problem we consider:

SELECT INITCAP(name)   
  FROM  
   (SELECT 'ALfa and omegA' AS name  
     UNION ALL SELECT 'alfa beta gamma zeta'  
     UNION ALL SELECT 'KSI PSI'  
     UNION ALL SELECT 'delta'
 ) X;

You can use console to verify this.

Suggested exercises: 106