Select random sampling from sqlserver quickly Select random sampling from sqlserver quickly database database

Select random sampling from sqlserver quickly


If you can use a pseudo-random sampling and you're on SQL Server 2005/2008, then take a look at TABLESAMPLE. For instance, an example from SQL Server 2008 / AdventureWorks 2008 which works based on rows:

USE AdventureWorks2008; GO SELECT FirstName, LastNameFROM Person.Person TABLESAMPLE (100 ROWS)WHERE EmailPromotion = 2;

The catch is that TABLESAMPLE isn't exactly random as it generates a given number of rows from each physical page. You may not get back exactly 5000 rows unless you limit with TOP as well. If you're on SQL Server 2000, you're going to have to either generate a temporary table which match the primary key or you're going to have to do it using a method using NEWID().


Have you looked into using the TABLESAMPLE clause?

For example:

select *from HumanResources.Department tablesample (5 percent)


SQL Server 2000 Solution, regarding to Microsoft (instead of slow NEWID() on larger Tables):

SELECT * FROM Table1WHERE (ABS(CAST( (BINARY_CHECKSUM(*) *  RAND()) as int)) % 100) < 10

The SQL Server team at Microsoft realized that not being able to take random samples of rows easily was a common problem in SQL Server 2000; so, the team addressed the problem in SQL Server 2005 by introducing the TABLESAMPLE clause. This clause selects a subset of rows by choosing random data pages and returning all of the rows on those pages. However, for those of us who still have products that run on SQL Server 2000 and need backward-compatibility, or who need truly row-level randomness, the BINARY_CHECKSUM query is a very effective workaround.

Explanation can be found here:http://msdn.microsoft.com/en-us/library/cc441928.aspx