loading..
Русский    English
08:31

Random sampling of rows from a table in SQL Server page 2

Now, let us employ the RAND function for our problem, namely, for producing two randomly selected models:

Console
Execute
  1. SELECT TOP 2 model
  2. FROM Product
  3. ORDER BY RAND(model);

We obtain two models ascendingly ordered by their numbers. That's where pseudorandomness shows up. Let's have a look at what kind of random numbers we've got:

Console
Execute
  1. SELECT model, RAND(model) rnd
  2. FROM Product
  3. ORDER BY rnd;

For brevity sake, I drop part of them:

Model rnd
1121 0.73446092102210758
1232 0.73652918083176777
1233 0.73654781380302592
1260 0.73705090402699736
1276 0.7373490315671285
1288 0.73757262722222694
1298 0.73775895693480897

I don't know which algorythm is used for random number computing, nevertheless I can state that the RAND function behaves monotonously within this range of model numbers. That is why we failed.

 Электронная документация SQL Server Books OnlineBOL gives an example of generating a sequence of random numbers using the system time in order to dynamically change initial value. Here it is:

Console
Execute
  1. SELECT RAND( (DATEPART(mm, GETDATE()) * 100000 )
  2. + (DATEPART(ss, GETDATE()) * 1000 )
  3. + DATEPART(ms, GETDATE())
  4. );

However, in this form, the query can be run only in slow packages, so that its repeated execution be less frequent than the time unit used in the query (milliseconds). The use of this function will obviously not meet the condition. But multiplying the argument by some unique identifier will score a success (this approach was suggested by V.I. Gershovich):

Console
Execute
  1. SELECT model, RAND(model*(DATEPART(mm, GETDATE()) * 100000 )
  2. + (DATEPART(ss, GETDATE()) * 1000 )
  3. + DATEPART(ms, GETDATE())
  4. )
  5. FROM Product
  6. ORDER BY model;

But here is another problem, related to the argument of the RAND function being integer. If we exceed the maximum allowable value for the integer (for  A database management system (DBMS) by Microsoft Corporation. SQL(Structured Query Language) is a database computer language designed for the retrieval and management of data in relational database management systems (RDBMS), database schema creation and modification, and database object access control management.SQL Server it makes 2^31-1 or 2 147 483 647), we'll get the following error message:

Arithmetic overflow error converting expression to data type int.

You can check it by running the above query on the website. The error appears for model numbers exceeding 2000. In such cases a normalizing factor is needed as well, for instance,

  1. CASE
  2. WHEN model < 2000
  3. THEN model
  4. ELSE model/10+model % 10
  5. END

Here, model % 10 (remainder from division by 10) is incorporated, lest significants digits should be lost; otherwise, for models which numbers differ by less then ten equal values might be produced.

The net solution looks like this (naturally, sorting should be performed by rnd rather then by model which we left for clearness of result).

Solution 8.8.3

Console
Execute
  1. SELECT model,
  2. RAND(CASE
  3. WHEN model < 2000
  4. THEN model
  5. ELSE model/10+model % 10
  6. END *
  7. (DATEPART(mm, GETDATE()) * 100000 )
  8. + (DATEPART(ss, GETDATE()) * 1000 )
  9. + DATEPART(ms, GETDATE())
  10. ) rnd
  11. FROM Product
  12. ORDER BY model;

And now compare results:

model Rnd
1121 0.40138073102287292
1232 0.48719939613580043
1233 0.98346802618590112
1260 0.38272122312416984
1276 0.3230194099746666
1288 0.27824305011253919
1298 0.24092941689409972

Conclusion. The simpler and more reliable way of solving the problem in question is to use the NEWID() function which guarantees the uniqueness of values. However, these values are not numeric. Therefore, when it is a number that is needed, one should pay attention to the RAND() function.

Pages 1 2
Tags
aggregate functions Airport ALL AND AS keyword ASCII AVG Battles Bezhaev Bismarck C.J.Date calculated columns Cartesian product CASE cast CHAR CHARINDEX Chebykin check constraint classes COALESCE common table expressions comparison predicates Computer firm CONSTRAINT CONVERT correlated subqueries COUNT CROSS APPLY CTE data type conversion data types database schema DATEADD DATEDIFF DATENAME DATEPART DATETIME date_time functions DDL DEFAULT DEFAULT VALUES DELETE DISTINCT DML duplicates edge equi-join EXCEPT exercise (-2) More tags
The book was updated
several days ago
©SQL-EX,2008 [Evolution] [Feedback] [About] [Links] [Team]
All right reserved.