Unit Testing Randomness

I was surprised when reading this that Knuth was only mentioned once. Knuth provides more than a few StatisticalTestsForRandomness? (and really, the discussion on this page about that should probably be moved over) in Seminumerical Algorithms, and of course one should use all of them (and more) when one is testing a PseudoRandomNumberGenerator. I think the crux of the problem presented here is not how to test a PRNG, but how to test a program which takes as part of its input a random sequence. It's not as simple as using a predictable sequence to test it, because even simple changes to the code may use up the sequence in a different order and break tons of tests (which isn't helpful, because all that stuff didn't break). I know I don't want to go change 50 tests because I reordered two lines. My suggestion is that somewhere, your code should be doing something with the numbers. You can test whether or not it does the right things with the right numbers, and separately test whether or not you're feeding it random numbers in the first place.

How do you unit test for randomness? Other than asking for n randoms and verifying that you do get n randoms, what else can you do?

In a language like Java, it makes no sense to test that the randoms are of the correct type - if you call random.getInt you'll get an integer, no question. You can test to see if the numbers are generated in the range you specify; e.g., if you ask for a random number between 1 and 10 and you get 11 or -23904, you know something is wrong.

In some applications, such as cryptography, it's not enough that the random number generator works to produce an apparently random sequence, it has to be really random. The simple-minded way isn't sufficient: generate a whole bunch of randoms between 1 and 10, count the occurrences of each, and see if it forms an equal distribution. (Example: http://www.ibiblio.org/javafaq/books/jdr/examples/9/9.1.html)

The reason this isn't sufficient is that a random number generator could be completely predictable and still generate an equal distribution, but the predictability makes it cryptographically non-random.

Another way is known as using runs, as explained at: http://www.uwm.edu/~ericskey/361F98/L29/node2.html

Generate a bunch of randoms and check to make sure the results don't look like 11111888888666663333366666 ... or 0123456789012345678901234567890123456789 ..

Note that while either of these results would pass the simple distribution test, they are probably not random.

What does it mean to be random? Bruce Schneier discusses this in his Applied Cryptography. There are a number of statistical properties of randomness.

: they should have about the same number of ones and zeros, about half the runs (sequences of the same bit) should be of length one, one quarter of length two, one eighth of length three, and so on. They should not be compressible. The distribution of run lengths for zeros and ones should be the same. These properties can be empirically measured and then compared to statistical expectations using a chi-square test. (p. 45)

See also Knuth, D ''The Art of Computer Programming: Volume 2, Seminumerical Algorithms''

However, Schneier throws out two more properties that blow the idea of a unit test out of the water. First, a sequence must be unpredictable, or "computationally infeasible to predict what the next random bit will be", and second, "It cannot be reliably reproduced".

What does that leave? A unit test that is sophisticated enough to check the statistical properties of a run is likely to be complex enough that it will itself need unit tests to determine its correctness. A unit test that is simple doesn't really determine randomness.

This question concretely applies to a Java implementation of a random number generator that replaces the PseudoRandomNumberGenerator in java.util.Random with code that uses the /dev/random device on Linux (and other modern Unix-like operating systems) for use in a cryptographic application. It's possible to write a test which verifies that the class adheres to the same interface as the built-in library generator, and that it generates the number of randoms requested and within a specified range, but to write a unit test that tests for true randomness is another matter.

From UnitTestExamplesAndGuidelines we get the following formula

example:

  one test case per class, named ClassNameTestCase
  one test method per public method, named testMethodName
  one test method per private algorithm method, named testMethodName
  in each test method, test good results and failures
sometimes split test method into two:
testMethodName
testMethodNameFailures
sometimes, in addition to one test case per class,
add test cases for special "inter-class" logic

The only test here that really matters is the third one, testing the private algorithm, and this is, as explained above, "hard". What this example really serves to illustrate is that a simple-minded approach to unit testing is insufficient for a non-null subset of applications. It is a plea to be thoughtful, especially in non-trivial domains, about what a good testing plan is.

Perhaps verifying the randomness of a random number generator falls under the heading of acceptance testing for some domains, but likely most users will not care about that level of functionality, but it is important that it be right anyway, because the users will care about things like security. Focusing on the customer's intent - security through correct strong cryptography, requires asking questions about the properties of the random generator. Yes, in many cases a pseudo-random sequence is fine. Randomly selecting the thought-of-the-day from a file of aphorisms, for example. But here we have a case where the user story talks about privacy, security, and non-repudiation, but the actual implementation of strong cryptography, as practiced today, must address mathematical and algorithmic issues that are not necessarily in the user's sight.

Technically, I understand the ideas behind testing for "true" randomness, and have some familiarity with a number of the possible ways to write code that will at least verify that the random number generator is truly random, according to the test. My interest is not precisely how, but "consider the following". The code techniques for testing randomness are pretty complex, enough so that an attentive programmer should be concerned that the tests themselves are correct. Or else the unit tests may be simple enough to be correct (or at least not so subtle as to be in question), but then they won't truly be testing randomness properties, only that the generator is returning a number in the expected range.

-- StevenNewton

Well, you could try plugging it into a nice cup of hot tea and turning it on...

Or you could see what other people have done: http://stat.fsu.edu/~geo/diehard.html (Dead Link) http://world.std.com/~franl/crypto/random-numbers.html and apply their tests. Or implement what you've written above: build a predictive process (e.g. a MarkovModel?, or another CompressionAlgorithm?) and ascertain it's incapable of predicting the process (e.g. all transitions have uniformly distributed probability).

I've seen winzip used as a tool to measure the randomness of a file of values before (obviously, the smaller it can compress the file the less random it is).

Autocorrelation is useful for testing random numbers. Basically this compares the sequence to itself with various offsets.

I would suggest you step back a little bit and look at what your user requirements are. Almost by definition, no algorithm will produce random results. If your customer requires some level of pseudo-random results, he needs to specify what he means, either by algorithm or resulting distribution (i.e. normal distribution or some other). Focus less on whether some algorithm is generating "random" results and more on the customer's intent. A trivial, fixed sequence may suffice. -- WayneMack

Perhaps you could simply *assume* that /dev/random is "random enough". Then the test merely needs to confirm that the sequence that comes from /dev/random/ is bit-for-bit identical to the sequence that your Java code sees.

I suppose you could write some other test to confirm that /dev/random is "random enough" ... if that test fails, are you prepared to fix the code underneath /dev/random ? Testing /dev/random is very difficult for all the reasons you mentioned, but surely we only need to test this OnceAndOnlyOnce, rather than in every application that uses it.

-- DavidCary

How about just comparing the first run of the algorithm with the second run? If they are different the test passes. Another simple test, using the same seed value, run the random number generation routine twice. If the results differ, then the test passes. If you need something beyond these, you will need a definition in the context of your application.

An excellent comparison is to compare /dev/random sequences with those generated by the Blum Blum Shub generator, which is slow, but very easy to implement, and is wonderfully pseudo-random in the many senses that are important to cryptography, such as that the previous or next output is not predictable by the current output, not even in the sense of a slight bias.

On the other hand, theoretically the non-deterministic randomness generated by /dev/random should sometimes be surprising, e.g. by indicating more bias at times than something like Blum Blum Shub. Any such surprises should be investigated with a battery of statistical tests (I seem to recall that one such suite designed for this purpose is called "die hard"), rather than immediately assuming e.g. that /dev/random is broken.

Physical sources of randomness turn out to not be as random as people expect. E.g. radioactive decay is indeed random, however all measuring devices introduce bias, so it gets into some truly tricky issues.

repeatability is probably more often a problem with physical sources as you either have to store every random bit you generate or make any test that depends on the exact sequence of bits unrepeatable

So called negative user tests (i.e., random run of numbers must not be predictable) are not really feasible. One needs a proven algorithm to deal with these. The trick is, once you have the algorithm, you can test the properties of the implementation. It's up to you to make sure that the implementations of those tests add up to an implementation of that algorithm. Note that they don't have to add up to a proof of the algorithm.

Surely this should be Acceptance testing randomness. You want to test that the generated sequence meets your needs, not that it is objectively random.

Excellent point. In fact it is more often the case than not, that true randomness would not be desirable. Frequently, one really wants to see e.g. an approximation of all possible sequences of length 1..n delivered in random order, and requesting pseudo-random elements is merely an easy way to approximate that. One might also wish, in the ideal, to exclude highly atypical sequences, such as the same number repeated 50 times in a row - yet a truly random sequence will include such weird things, eventually. Or even immediately.
A related issue is one of the distribution. Choosing pseudo random numbers distributed smoothly between 0 and 10^10 will result in most numbers being 10 digits, where we might actually want the resulting numbers to equally represent all numbers of digits...for 1 digit numbers to be approximately as frequent in appearance as 10 digit numbers, which is a logarithmic distribution, and may actually not represent the characteristics of the underlying model well, but which can be great for testing by amplifying infrequent corner cases up to the level of being testable.
So as you say, it's a question of what your needs are, much more often than a question of how good the simulation of randomness.