Random Testing

Random testing is a form of functional testing that is useful when the time needed to write&run directed tests is too long (or the complexity of the problem makes it impossible to test every combination). Release criteria may include a statement about the amount of random testing that is required. For example, we have a requirement that there should be no random failures for 2 weeks prior to release (that is 2 weeks of continuous random testing on 50 workstations).

One of the big issues of random testing is to know when a test fails. As with all testing, an oracle is needed. You could rely in assertions in the code as your sole oracle (i.e. you throw random inputs at the code, possibly from multiple threads, and if no GPF happens in 2 weeks then you assume it's OK). In other situations, common with hardware development, you have two different implementations of the same specification (one is 'the golden model', the other is 'the implementation'. If they both agree to a defined accuracy then the test passes.

When doing random testing you must, of course, ensure that your tests are sufficiently random, and that they cover the spec. Repeating the same test for 2 weeks doesn't tell you anything.

It is often claimed, correctly, that random testing is less efficient than directed testing. But you must consider the time needed to write a random test generator vs the time to write a set of directed tests (or generators). Once you have a random test generator, you computer(s) can work 24 hours a day generating new tests.

--DaveWhipp

Random or Functional Testing? What we need to know is how many defects over a given period each technique finds, which should be able to be measured.

Random testing is useful even if it doesn't find as many defects per time interval, since it can be performed without manual intervention. An hour of computer time can be much less expensive than an hour of human time.

Possibly, the combination of the two would find more than one alone. But then, it probably also depends on the software in question.

--MatthewTuck

We call those tests DirectedRandomTests?. In fact, our tests can be classified in terms of their degree of randomness. Very few are 100% random because, in general, that doesn't lead to interesting test cases. When a random test contains a large number of random elements that are mutually constrained, it is handly to have a constraint solver to avoid thrashing (e.g. if the rule is that "A,B are integers such that A < B" then the solver might select a value of A first; and then select a value of B between A and MAX_INT. The alternative is to generate 2 random numbers and throw the test away if the constraint isn't met). Language such as Vera and E have been developed for the purpose of defining test suites for directed-random testing. --Dave

Random testing has been applied to Unix utilities, finding many bugs:

: http://citeseer.nj.nec.com/miller90empirical.html

There is another advantage of random testing: if you have a selection of random tests generated by reference to a model of the "operational profile" of common usage, you can make inferences about the reliability of the software in production use. Yes, this does mean the quality of this inference depends on how good your operational profile model is, but (as in my case, in telephony) it's just about the only reasonably well supported way to estimating mean-time-to-failure for software. -- Charlie Martin

Directed random testing has been applied to testing of C compilers

: http://research.compaq.com/wrl/DECarchives/DTJ/DTJT08/DTJT08HM.HTM

Compilers are a good target for this kind of testing because they have very complicated internal control flow, yet their behavior is subject to automatically testable invariants.

A very similar technique has been applied to testing Common Lisp compilers. The tests at http://savannah.gnu.org/cgi-bin/viewcvs/gcl/gcl/ansi-tests/misc.lsp?rev=1.96&content-type=text/vnd.viewcvs-markup were produced by a random test generator and test pruner.

The 'differential testing' described in that link could also be useful for testing that refactoring has not changed the behavior of a module. The tester doesn't need to understand the module, it just needs to be able to compare the behavior of the old and new versions on the same set of random inputs. -- PaulDietz

If you're using random tests in a test suite, make sure that failures can be reproduced (perhaps by logging the random number seed). A non-reproducible bug doesn't help anybody! Furthermore, if random testing reveals bugs, it would be wise to create non-random unit tests from these cases, because tomorrow's random input won't necessarily protect you against regression. --MichaelPrescott

I took your advice, and modified QuickCheck to allow TestDrivenDevelopment. Now each failing test case saves the seed immediately. This means I can check that my tests really do fail, that I can recheck any failures, and generally makes life easier. I haven't yet added code to turn failed cases into non-random unit tests, mostly because there are some values that can be generated, but cannot be saved! --ShaeErisson

There's a really nice RandomTesting tool written in Haskell called QuickCheck, see http://www.math.chalmers.se/~rjmh/QuickCheck/ I've used it, it's great. I can show demo tests if someone wants. -- ShaeErisson

Charlie Martin above advocates using "operational profile of common usage" as a basis for random testing.

I have this pattern. We call it production load simulator. And in reality it's not much more than a load test taken seriously (though few people seem to take load testing seriously, it seems).

What this means is a test suite which has more test assertions than a "conventional" load test would have, but less than a functional test. Random values are used wherever relevant, and are selected from predefined sets that cover different behaviors. This lets you write table-based oracles and smart assertions. E.g.

 if (event.isSms()) {
     assertEverythingAboutSms(event);
 }

 Money expectedCharge = expectedChargeValues.get(event.getType());
 assertEquals(expectedCharge, event.getCharge());

We found this approach to be both simple and powerful for hunting memory leaks, thread-safety bugs and other stuff that normal functional testing doesn't catch.

I also like to run complete [automated] functional test suite while the system is under simulated load. This doesn't cost anything, and normally is non-revealing, but has yielded a couple of nasty, and very obscure bugs over time.

As a side-effect, statement like "system has successfully executed 48 hours production simulation" gives an enormous boost to the comfort level of project stakeholders. This, any project manager will tell you, is a Good Thing (TM) in its own right.

-- AlexeyVerkhovsky

How come nobody mentioned random drug testing? Oh, yeah -- just did. Sorry.