Equivalents for analytic functions

Equivalents for analytic functions

Variables can also be used to emulate analytic functions. A few examples follow. For simplicity, let’s assume all data fields NOT NULL, and sorting and partitioning (PARTITION BY) use one column only. Including NULLs and multi-column sorting/partitioning would make the examples more cumbersome and difficult to read without changing their basic idea.

Let’s create a table named TestTable for our examples:

CREATE TABLE TestTable(
  group_id INT NOT NULL,
  order_id INT UNIQUE NOT NULL,
  value INT NOT NULL
);

Here, group_id is the group identifier (the equivalent of an analytic function window), order_id contains unique values used for sorting, and value is just a numeric value.

Let’s fill our table with test data:

INSERT TestTable(order_id, group_id, value)
SELECT *
FROM(
  SELECT 1 order_id, 1 group_id, 1 value
  UNION ALL SELECT 2, 1, 2
  UNION ALL SELECT 3, 1, 2
  UNION ALL SELECT 4, 2, 1
  UNION ALL SELECT 5, 2, 2
  UNION ALL SELECT 6, 2, 3
  UNION ALL SELECT 7, 3, 1
  UNION ALL SELECT 8, 3, 2
  UNION ALL SELECT 9, 4, 1
  UNION ALL SELECT 11, 3, 2
)T;

Some examples of analytic function substitutes.

1) ROW_NUMBER() OVER(ORDER BY order_id)

SELECT T.*, @I:=@I+1 RowNum
FROM TestTable T,(SELECT @I:=0)I
ORDER BY order_id;
group_idorder_idvalueRowNum
1111
1222
1323
2414
2525
2636
3717
3828
4919
311210

2) ROW_NUMBER() OVER(PARTITION BY group_id ORDER BY order_id)

SELECT group_id, order_id, value, RowNum
FROM(
  SELECT T.*,
    IF(@last_group_id = group_id, @I:=@I+1, @I:=1) RowNum,
    @last_group_id := group_id
  FROM TestTable T,(SELECT @last_group_id:=NULL, @I:=0)I
  ORDER BY group_id, order_id
)T;
group_idorder_idvalueRowNum
1111
1222
1323
2411
2522
2633
3711
3822
31123
4911

3) SUM(value) OVER(PARTITION BY group_id ORDER BY order_id)

SELECT group_id, order_id, value, RunningTotal
FROM(
  SELECT T.*,
    IF(@last_group_id = group_id, @I:=@I+value, @I:=value) RunningTotal,
    @last_group_id := group_id
  FROM TestTable T, (SELECT @last_group_id:=NULL, @I:=0)I
  ORDER BY group_id, order_id
)T;
group_idorder_idvalueRunningTotal
1111
1223
1325
2411
2523
2636
3711
3823
31125
4911

4) LAG(value) OVER(PARTITION BY group_id ORDER BY order_id)

SELECT group_id, order_id, value, LAG
FROM(
  SELECT T.*,
    IF(@last_group_id = group_id, @last_value, NULL) LAG,
    @last_group_id := group_id,
    @last_value := value
  FROM TestTable T,(SELECT @last_value:=NULL, @last_group_id:=NULL)I
  ORDER BY group_id, order_id
)T;
group_idorder_idvalueLAG
111NULL
1221
1322
241NULL
2521
2632
371NULL
3821
31122
491NULL

LEAD can be implemented in the same manner; just the sorting clause has to be changed to ORDER BY group_id, order_id DESC.

It’s somewhat more complicated for the functions COUNT, MIN, and MAX, since we can’t find out the function value before analyzing all rows in the group (window). For instance, MS SQL Server spools the window (temporary writes its rows to a hidden buffer table for later reference) for this purpose; MySQL has no such functionality. However, we can calculate the function value for the last row (according to the actual sorting order) of the respective group (that is, after processing the whole window), then sort the rows in reverse order and finally replicate the calculated value throughout the whole window.

Thus, we need to do the sorting twice. To ensure the same order as in the examples above, let’s first sort the data by group_id ASC, order_id DESC, then by group_id ASC, order_id ASC.

5) COUNT(*) OVER(PARTITION BY group_id)

Within the first sorting, we just enumerate the rows. Within the second one, we copy the maximum value, which represents the number of rows in the respective group, to all rows in the window.

SELECT group_id, order_id, value, Cnt
FROM(
  SELECT group_id, order_id, value,
    IF(@last_group_id = group_id, @MaxRowNum, @MaxRowNum := RowNumDesc) Cnt,
    @last_group_id := group_id
  FROM(
    SELECT T.*,
      IF(@last_group_id = group_id, @I:=@I+1, @I:=1) RowNumDesc,
      @last_group_id := group_id
    FROM TestTable T,(SELECT @last_group_id:=NULL, @I:=0)I
    ORDER BY group_id, order_id DESC /*первая сортировка*/
  )T,(SELECT @last_group_id:=NULL, @MaxRowNum:=NULL)I
  ORDER BY group_id, order_id /*вторая сортировка*/
)T;
group_idorder_idvalueCnt
1113
1223
1323
2413
2523
2633
3713
3823
31123
4911

MAX and MIN are calculated similarly. I confine myself to an example for MAX only:

6) MAX(value) OVER(PARTITION BY group_id)

SELECT group_id, order_id, value, MaxVal
FROM(
  SELECT group_id, order_id, value,
    IF(@last_group_id = group_id, @MaxVal, @MaxVal := MaxVal) MaxVal,
    @last_group_id := group_id
  FROM(
    SELECT T.*,
      IF(@last_group_id = group_id, GREATEST(@MaxVal, value), @MaxVal:=value) MaxVal,
      @last_group_id := group_id
    FROM TestTable T,(SELECT @last_group_id:=NULL, @MaxVal:=NULL)I
    ORDER BY group_id, order_id DESC
  )T,(SELECT @last_group_id:=NULL, @MaxVal:=NULL)I
  ORDER BY group_id, order_id
)T;
group_idorder_idvalueMaxVal
1112
1222
1322
2413
2523
2633
3712
3822
31122
4911

7) COUNT(DISTINCT value) OVER(PARTITION BY group_id)

This is an interesting function presently not implemented in MS SQL Server; however, it can be calculated using a subquery by taking MAX of RANK. Let’s do the same here. Within the first sorting RANK() OVER(PARTITION BY group_id ORDER BY value DESC) is calculated; then, within the second sorting, its maximum value is copied to all rows of each respective window:

SELECT group_id, order_id, value, Cnt
FROM(
  SELECT group_id, order_id, value,
    IF(@last_group_id = group_id, @Rank, @Rank := Rank) Cnt,
    @last_group_id := group_id
  FROM(
    SELECT T.*,
      IF(@last_group_id = group_id,
        IF(@last_value = value, @Rank, @Rank:=@Rank+1)
      , @Rank:=1) Rank,
      @last_group_id := group_id,
      @last_value := value
    FROM TestTable T,(SELECT @last_value:=NULL, @last_group_id:=NULL, @Rank:=0)I
    ORDER BY group_id, value DESC, order_id DESC
  )T,(SELECT @last_group_id:=NULL, @Rank:=NULL)I
  ORDER BY group_id, value, order_id
)T;
group_idorder_idvalueCnt
1112
1222
1322
2413
2523
2633
3712
3822
31122
4911