Test Speed

The UnitTest suite should run quickly. If it takes too long to run, the developers will resist unit testing, and experience longer feedback cycles.

Programmers will tolerate running them about 2% of coding time (30 seconds is a reasonable maximum). Any more and you can expect short-cutting.

Here are some techniques to keep tests speedy.

Unit Tests, Not Targeted System Tests

Really do unit testing - at the level of an individual class or function.

Bad example: if you are writing a simulator, testing the cache by bits of assembly code is NOT unit testing. It is full system testing via microbenchmarks that are targeted to a particular unit. This is good, but because it is full CPU testing it requires much more of the CPU to run the test, and hence is slower.

Better example: a test jig that accepts just a cache, without the rest of the CPU.

Different Speed Levels of Test Suites

Many groups divide the test suite according to the time to run. E.g. there might be a simple set of tests that runs in 3 minutes, and a more comprehensive set of tests that runs in 4-8 hours, once every night.

(In big projects like microprocessors and OSes, there are still more comprehensive tests that take days to weeks to run. Not to mention stress tests.)

The standard XP separation into UnitTests and AcceptanceTests is something like that, but on big enough projects even the UnitTests may need to be layered.

Selective Testing

Automatic Selective Testing

The test suite can be divided into sections that are somewhat independent.

For example, if you have changed package Bar, e.g. the sed text editor in the distribution of a UNIX-like OS, then ideally re-unning the unit tests for package Bar (sed) will be sufficient.

Ideally, there should be no need to rerun the tests for packages that have nothing to do with package Bar (sed), such as kernel configuration.

However, there may be hidden dependencies. E.g. kernel configuration may use sed scripts, that may break. This *SHOULD* *NOT* happen - but we all know that it does happen in the real world.

Nevertheless, running only a selective set of unit tests for package Bar may be desirable. Some selection of tests for related units Foo that directly call Bar is a good idea.

Dependencies between units should be documented and/or computed, and could be analyzed to select which tests need to be rerun.

However, in general it is unreasonable for the tests for all users of a common library to be rerun when the library changes. Instead, if a user of a library breaks, if it is decided that the broken behavior is part of the "contract", then a test should be added to the library's unit tests.

Rerunning the tests for users of a library should be done regularly, but not necessarily immediately.

Guided Selective Testing

Traditional CppUnit GUIs provide browsable lists where the programmer can select which tests to run, and can avoid rerunning slow tests that are not affected by a change (supposedly).

If this is done, please record what tests were run with a checkin.

Parallelizing Tests

The tests for a module Bar may live in a file Bar-test.cc, accompanying the interface and source code for the module in Bar.hh and Bar.cc.

If Bar-test.cc has a main program, it can be run standalone. One can then use parallel make to build and run the unit tests in parallel on many machines, or in many processes on one machine, hopefully speeding things up.

Unfortunately, placing an unifdeffed main() in Bar-test.cc means that the tests in Bar-test.cc cannot necessarily be linked with other tests, so that a non-parallel version of the tests that has neat GUI features is a bit harder to obtain.

Solve this by either:

  1. leaving just the tests in Bar-test.cc, but place the main() program in Bar-test-main.cc, automatically generated by a TestCollector, or
  2. have the overall suite know about building the separate programs, and running those.

Run Likely to Break Tests First

The overall time to run the tests may not change, but the test/code cycle may speed up if the tests that are most likely to break are run first.

Some TestCollector scripts will sort tests according to the modification time of the files involved.

If you are manually creating your test lists, you can manually place the tests you are actively working on highest in the list.

Beware: make sure that you sort the list before using it as a TestInventory, or else the innocuous differences created by TestOrdering may lead you to ignore the true differences caused by accidentally omitting a test.


The idea I keep in my head for speed levels is Feynman's description of the calculation lab at Los Alamos, where the different colors of cards were processed in loops to repair errors faster than they could propagate.

Interestingly, that is almost exactly the sort of approach that I and others have proposed to guarantee that replays in a modern processor eventually catch up to the bogus instructions executing with the incorrect data.


CategoryTesting


EditText of this page (last edited May 16, 2005) or FindPage with title or text search