Equivalents for analytic functions
Variables can also be used to emulate analytic functions. A few examples follow. For simplicity, let’s assume all data fields NOT NULL, and sorting and partitioning (PARTITION BY) use one column only. Including NULLs and multi-column sorting/partitioning would make the examples more cumbersome and difficult to read without changing their basic idea.
Let’s create a table named TestTable for our examples:
CREATE TABLE TestTable(
group_id INT NOT NULL,
order_id INT UNIQUE NOT NULL,
value INT NOT NULL
);
Here, group_id is the group identifier (the equivalent of an analytic function window), order_id contains unique values used for sorting, and value is just a numeric value.
Let’s fill our table with test data:
INSERT TestTable(order_id, group_id, value)
SELECT *
FROM(
SELECT 1 order_id, 1 group_id, 1 value
UNION ALL SELECT 2, 1, 2
UNION ALL SELECT 3, 1, 2
UNION ALL SELECT 4, 2, 1
UNION ALL SELECT 5, 2, 2
UNION ALL SELECT 6, 2, 3
UNION ALL SELECT 7, 3, 1
UNION ALL SELECT 8, 3, 2
UNION ALL SELECT 9, 4, 1
UNION ALL SELECT 11, 3, 2
)T;
Some examples of analytic function substitutes.
1) ROW_NUMBER() OVER(ORDER BY order_id)
SELECT T.*, @I:=@I+1 RowNum
FROM TestTable T,(SELECT @I:=0)I
ORDER BY order_id;
group_id | order_id | value | RowNum |
---|---|---|---|
1 | 1 | 1 | 1 |
1 | 2 | 2 | 2 |
1 | 3 | 2 | 3 |
2 | 4 | 1 | 4 |
2 | 5 | 2 | 5 |
2 | 6 | 3 | 6 |
3 | 7 | 1 | 7 |
3 | 8 | 2 | 8 |
4 | 9 | 1 | 9 |
3 | 11 | 2 | 10 |
2) ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY order_id)
SELECT group_id, order_id, value, RowNum
FROM(
SELECT T.*,
IF(@last_group_id = group_id, @I:=@I+1, @I:=1) RowNum,
@last_group_id := group_id
FROM TestTable T,(SELECT @last_group_id:=NULL, @I:=0)I
ORDER BY group_id, order_id
)T;
group_id | order_id | value | RowNum |
---|---|---|---|
1 | 1 | 1 | 1 |
1 | 2 | 2 | 2 |
1 | 3 | 2 | 3 |
2 | 4 | 1 | 1 |
2 | 5 | 2 | 2 |
2 | 6 | 3 | 3 |
3 | 7 | 1 | 1 |
3 | 8 | 2 | 2 |
3 | 11 | 2 | 3 |
4 | 9 | 1 | 1 |
3) SUM(value) OVER(PARTITION BY group_id ORDER BY order_id)
SELECT group_id, order_id, value, RunningTotal
FROM(
SELECT T.*,
IF(@last_group_id = group_id, @I:=@I+value, @I:=value) RunningTotal,
@last_group_id := group_id
FROM TestTable T, (SELECT @last_group_id:=NULL, @I:=0)I
ORDER BY group_id, order_id
)T;
group_id | order_id | value | RunningTotal |
---|---|---|---|
1 | 1 | 1 | 1 |
1 | 2 | 2 | 3 |
1 | 3 | 2 | 5 |
2 | 4 | 1 | 1 |
2 | 5 | 2 | 3 |
2 | 6 | 3 | 6 |
3 | 7 | 1 | 1 |
3 | 8 | 2 | 3 |
3 | 11 | 2 | 5 |
4 | 9 | 1 | 1 |
4) LAG(value) OVER(PARTITION BY group_id ORDER BY order_id)
SELECT group_id, order_id, value, LAG
FROM(
SELECT T.*,
IF(@last_group_id = group_id, @last_value, NULL) LAG,
@last_group_id := group_id,
@last_value := value
FROM TestTable T,(SELECT @last_value:=NULL, @last_group_id:=NULL)I
ORDER BY group_id, order_id
)T;
group_id | order_id | value | LAG |
---|---|---|---|
1 | 1 | 1 | NULL |
1 | 2 | 2 | 1 |
1 | 3 | 2 | 2 |
2 | 4 | 1 | NULL |
2 | 5 | 2 | 1 |
2 | 6 | 3 | 2 |
3 | 7 | 1 | NULL |
3 | 8 | 2 | 1 |
3 | 11 | 2 | 2 |
4 | 9 | 1 | NULL |
LEAD can be implemented in the same manner; just the sorting clause has to be changed to ORDER BY group_id, order_id DESC.
It’s somewhat more complicated for the functions COUNT, MIN, and MAX, since we can’t find out the function value before analyzing all rows in the group (window). For instance, MS SQL Server spools the window (temporary writes its rows to a hidden buffer table for later reference) for this purpose; MySQL has no such functionality. However, we can calculate the function value for the last row (according to the actual sorting order) of the respective group (that is, after processing the whole window), then sort the rows in reverse order and finally replicate the calculated value throughout the whole window.
Thus, we need to do the sorting twice. To ensure the same order as in the examples above, let’s first sort the data by group_id ASC, order_id DESC, then by group_id ASC, order_id ASC.
5) COUNT(*) OVER(PARTITION BY group_id)
Within the first sorting, we just enumerate the rows. Within the second one, we copy the maximum value, which represents the number of rows in the respective group, to all rows in the window.
SELECT group_id, order_id, value, Cnt
FROM(
SELECT group_id, order_id, value,
IF(@last_group_id = group_id, @MaxRowNum, @MaxRowNum := RowNumDesc) Cnt,
@last_group_id := group_id
FROM(
SELECT T.*,
IF(@last_group_id = group_id, @I:=@I+1, @I:=1) RowNumDesc,
@last_group_id := group_id
FROM TestTable T,(SELECT @last_group_id:=NULL, @I:=0)I
ORDER BY group_id, order_id DESC /*первая сортировка*/
)T,(SELECT @last_group_id:=NULL, @MaxRowNum:=NULL)I
ORDER BY group_id, order_id /*вторая сортировка*/
)T;
group_id | order_id | value | Cnt |
---|---|---|---|
1 | 1 | 1 | 3 |
1 | 2 | 2 | 3 |
1 | 3 | 2 | 3 |
2 | 4 | 1 | 3 |
2 | 5 | 2 | 3 |
2 | 6 | 3 | 3 |
3 | 7 | 1 | 3 |
3 | 8 | 2 | 3 |
3 | 11 | 2 | 3 |
4 | 9 | 1 | 1 |
MAX and MIN are calculated similarly. I confine myself to an example for MAX only:
6) MAX(value) OVER(PARTITION BY group_id)
SELECT group_id, order_id, value, MaxVal
FROM(
SELECT group_id, order_id, value,
IF(@last_group_id = group_id, @MaxVal, @MaxVal := MaxVal) MaxVal,
@last_group_id := group_id
FROM(
SELECT T.*,
IF(@last_group_id = group_id, GREATEST(@MaxVal, value), @MaxVal:=value) MaxVal,
@last_group_id := group_id
FROM TestTable T,(SELECT @last_group_id:=NULL, @MaxVal:=NULL)I
ORDER BY group_id, order_id DESC
)T,(SELECT @last_group_id:=NULL, @MaxVal:=NULL)I
ORDER BY group_id, order_id
)T;
group_id | order_id | value | MaxVal |
---|---|---|---|
1 | 1 | 1 | 2 |
1 | 2 | 2 | 2 |
1 | 3 | 2 | 2 |
2 | 4 | 1 | 3 |
2 | 5 | 2 | 3 |
2 | 6 | 3 | 3 |
3 | 7 | 1 | 2 |
3 | 8 | 2 | 2 |
3 | 11 | 2 | 2 |
4 | 9 | 1 | 1 |
7) COUNT(DISTINCT value) OVER(PARTITION BY group_id)
This is an interesting function presently not implemented in MS SQL Server; however, it can be calculated using a subquery by taking MAX of RANK. Let’s do the same here. Within the first sorting RANK() OVER(PARTITION BY group_id ORDER BY value DESC) is calculated; then, within the second sorting, its maximum value is copied to all rows of each respective window:
SELECT group_id, order_id, value, Cnt
FROM(
SELECT group_id, order_id, value,
IF(@last_group_id = group_id, @Rank, @Rank := Rank) Cnt,
@last_group_id := group_id
FROM(
SELECT T.*,
IF(@last_group_id = group_id,
IF(@last_value = value, @Rank, @Rank:=@Rank+1)
, @Rank:=1) Rank,
@last_group_id := group_id,
@last_value := value
FROM TestTable T,(SELECT @last_value:=NULL, @last_group_id:=NULL, @Rank:=0)I
ORDER BY group_id, value DESC, order_id DESC
)T,(SELECT @last_group_id:=NULL, @Rank:=NULL)I
ORDER BY group_id, value, order_id
)T;
group_id | order_id | value | Cnt |
---|---|---|---|
1 | 1 | 1 | 2 |
1 | 2 | 2 | 2 |
1 | 3 | 2 | 2 |
2 | 4 | 1 | 3 |
2 | 5 | 2 | 3 |
2 | 6 | 3 | 3 |
3 | 7 | 1 | 2 |
3 | 8 | 2 | 2 |
3 | 11 | 2 | 2 |
4 | 9 | 1 | 1 |