Temporary Detailed Testing Supporting Refactoring

UnitTests andRegressionTests are good things. They provide confidence that program functionality is being maintained, allowing the programmer to RefactorMercilessly to improve the design.

I repeat: write the best unit tests you can. However, sometimes it is hard to write unit tests that provide full confidence. Or, sometimes such really precise unit tests can be written, but they need to be changed so often as interacting features are added to the code that they become a headache. (This often indicates that your features are too closely intertwined.) Or, sometimes "high level" tests that do not need to be constantly changed can be written, but they are either so abstract as not to inspire confidence during refactoring, or are computationally much slower to run.

TemporaryDetailedTestingSupportingRefactoring extends conventional unit testing as follows:

(1) Create really detailed tests, testing results that will not be expected to endure during normal program development.

E.g. I may record the full, exact, output of a simulator or unit under test as my "temporary gold" reference output at the beginning of simulation.

E.g. sometimes I keep some flavour of execution trace around as the "temporary gold" reference output.

(2) While you are refactoring, you are not supposed to be making any functional or semantic changes to your program. Thus, the really detailed tests should not change results; the "temporary gold" output should not change.

(3) When you have finished refactoring, and return to adding features, you no longer run the detailed tests, because the results that the detailed tests check may vary. (Or, you may not be able to automatically check the detailed test results, although you may be able to, occasionally, manually inspect them.) During this phase, you will, of course, be running ordinary unit tests and regression tests.

You can automate this process, creating a "detailed test scenario" whose output is recorded as "temporary gold" at the beginning of a refactoring episode, and automatically compared.


Certainly, *I* occasionally have problems writing sufficiently confidence inspiring tests. Perhaps I am just a lousy programmer, or a lousy test writer. Or, perhaps it is a characteristic of my problem domain. In any case, I have developed an extension to the test writing methodology, TemporaryDetailedTestingSupportingRefactoring, which I describe here, which provides increased confidence during MercilessRefactoring.

First, a brief description of my problem domain, as motivation: ProblemsTestingPerformanceSimulators. I suspect that these characteristics of my problem domain, performance simulation, that make durable tests hard, also are present in some other problem domains.


TemporaryDetailedTestingSupportingRefactoring is "temporary" only in the sense that the detailed comparison of results in the test is temporary: the "golden reference" is expected to change over time.

However, the automated test suite setup is durable.


Recording an execution trace as the temporary golden output is a useful technique, but it is vulnerable to changes that permute the order of invocation of routines, or the order of output. Such changes are not exactly refactorings, since they are not 100% equivalent, but they are often necessary to clean up code for further design improvements.

Comparing detailed results, like cycle counts in a performance simulator, may be less vulnerable to invocation order changes.

Thus, it can be seen that we have a hierarchy of code transformations and tests.


I HaveThisPattern. Another slight variant of this pattern I use, which is more transient, is to actually run the two different version side by side and compare precisely the results. I find it more useful when I'm refactoring a method which uses a database or something like that.

Example:

 testRefactoring() {
    List before = beforeMethod();
    List after = afterMethod();

assertEquals(before, after); // being very careful about your implementations of equals() }
I run this method against as many variations of input as I can find, and if I'm really paranoid I can even put the test into the method and throw an exception when there is a difference.

-- MatthewFarwell


EditText of this page (last edited March 18, 2006) or FindPage with title or text search