Random number analysis Random number analysis r r

Random number analysis


The DieHarder test battery by Robert G. Brown --- which reimplements and extends the old DIEHARD by Marsaglia et al -- has been wrapped into the R package RDieHarder which you could start with.

Note that RDieHarder versions need their particular matching DieHarder releases -- and we're not there yet for the most recent development version of the latter.

Edit Also, for the subset of cryptographioic tests, the NIST suite (which is included in DieHarder) should be appropriate as that is what it was designed for.


First you need to decide what kind of randomness you're testing for. Do you have in mind a uniform distribution inside some range? That's usually what people have in mind, though you may have some other flavor of randomness such as a normal distribution.

Once you have a candidate distribution, you can test the goodness of fit to that distribution. The Kolmogorov-Smirnov test is a good general-purpose test. I believe it's called ks.test in R. But I also believe it assumes distinct values, so that could be a problem if you're sampling from such a small range of values that the same value appears more than once.

S. Lott mentioned Knuth's Seminumerical Algorithms in the comments. That book has a good introduction to the chi-squared test and the Kolmogorov-Smirnov tests for goodness of fit.

If you do suspect you have uniform random values, the DIEHARD test that Dirk Eddelbuettel mentioned is a standard test.


According to Wikipedia (Randomness):

The central idea is that a string of bits is random if and only if it is shorter than any computer program that can produce that string (Kolmogorov randomness) — this means that random strings are those that cannot be compressed.

Therefore, given the random stream of numbers, save it to a file, and compress it using your favorite tool (zip, rar, ...). The compression ratio can be interpreted as measure of randomness... Even better, I would use it as a relative score to compare the randomness of two data series.