First of all, lets compare the performance of row enumeration queries using a self join and variables:

1) The classic method with a self join:

  1. SELECT COUNT(*)N, T1.*
  2. FROM TestTable T1
  3. JOIN TestTable T2 ON T1.order_id >= T2.order_id
  4. GROUP BY T1.order_id;
For 10000 records in the TestTable, this yields the following results:

Duration / Fetch

16.084 sec / 0.016 sec

2) Using variables:

  1. SELECT @N:=@N+1 N, T1.*
  2. FROM TestTable T1, (SELECT @N := 0)M
  3. ORDER BY T1.order_id;

Duration / Fetch

0.016 sec / 0.015 sec

The result speaks for itself. However, it should be understood that value calculation using variables is suboptimal for filtering. Sorting and calculation will be done for ALL rows, even if we need just a small portion of them in the end.

Lets examine this more closely using the following problem as an example:

Get the first 2 rows from TestTable for each group_id, with records in a group being sorted by order_id.

In a DBMS supporting analytic functions, the solution would look like this:

  1. SELECT group_id, order_id, value
  2. FROM(
  3. SELECT *, ROW_NUMBER()OVER(PARTITION BY group_id ORDER BY order_id) RowNum
  4. FROM TestTable
  5. )T
  6. WHERE RowNum <= 2;

Since the DBMS knows about ROW_NUMBER and how it works, the query optimizer doesnt need to enumerate ALL rows to get the first two. Thus, everything works fast (if indexes exist for group_id, order_id, of course).

For MySQL, the solution using a similar algorithm will look as follows:

  1. SELECT group_id, order_id, value
  2. FROM(
  3. SELECT T.*,
  4. IF(@last_group_id = group_id, @I:=@I+1, @I:=1) RowNum,
  5. @last_group_id := group_id
  6. FROM TestTable T,(SELECT @last_group_id:=NULL, @I:=0)I
  7. ORDER BY group_id, order_id
  8. )T
  9. WHERE RowNum <= 2;

However, the MySQL optimizer doesnt know anything about the rules we use to calculate the RowNum field. Thus, it will have to enumerate ALL rows, and only then pick out the ones needed.

Now suppose we have one million records and 20 unique values for group_id. That is, to get 40 rows, MySQL will calculate the value of RowNum a million times!There is no neat single-query solution for this problem in MySQL. Yet its possible to get the list of unique group_id values e. g., as shown below:

  1. SELECT DISTINCT group_id FROM TestTable;

and then, using any other programming language, generate a query like this:

  1. SELECT * FROM TestTable WHERE group_id=1 ORDER BY order_id LIMIT 2
  3. SELECT * FROM TestTable WHERE group_id=2 ORDER BY order_id LIMIT 2
  5. SELECT * FROM TestTable WHERE group_id=20 ORDER BY order_id LIMIT 2;

20 lightweight queries will be executed much faster than the calculation of RowNum for a million rows.

Bookmark and Share
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema date/time functions DATEADD DATEDIFF DATENAME DATEPART DATETIME DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates EXCEPT exercise (-2) exercise 19 exercise 23 exercise 32 More tags
The book was updated
several days ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.
Rambler's Top100