Refactoring Test Code

Two key aspects of extreme programming (XP) are unit testing and merciless refactoring. Given the fact that the ideal test code / production code ratio approaches 1:1, it is not surprising that unit tests are being refactored. We found that refactoring test code is different from refactoring production code in two ways: (1) there is a distinct set of bad smells involved, and (2) improving test code involves additional test-specific refactorings.

In a paper written by ArieVanDeursen, LeonMoonen (CWI), AlexVanDenBergh and Gerard Kok (SoftwareImprovementGroup) that is presented at XpTwoThousandAndOne, we share our experiences on refactoring test code by discussing a set of bad smells that indicate trouble in test code, and by presenting a collection of test refactorings to remove these smells.

The paper is available online at http://www.cwi.nl/~leon/papers/xp2001/xp2001.pdf

We believe that the paper presents only a starting point for a larger catalog on test smells and test refactorings that should be based based on a broader set of projects. Therefore, we invite readers interested in further discussion to add smells and refactorings to this page.

-- LeonMoonen

There is just such a collection now available online at http://xunitpatterns.com/TestSmells.html

-- GerardMeszaros


Avoiding False Positives

The benefits of RefactoringTestCode seem self-evident to me, but the risks also appear to be higher than in refactoring non-test code. Normal code only has one kind of correctness -- it has to pass the tests you've written -- but test code has two. It has to warn you when the code is broken, and it has to be quiet when the code works. That is, correct test code gives neither false negatives nor false positives.

To take care of this while you're initially writing the test code, you:

... which works fine when you're initially writing the test. But then, if months later, you're refactoring it, you can only ensure that it still passes when run on your correct code. That is, you can ensure it doesn't give a false negative, but how do you check to see if it doesn't give a false positive?

The only way I can think of to get around this is to disable the underlying code to ensure proper test failure, but this seems slow and cumbersome. Is there some other tactic?

MutationTesting springs to mind. But I'm not sure it would be a complete solution. -- JeffGrigg

I've also used MutationTesting for exactly this sort of thing. -- PaulTevis

Following advice somebody gave me on MockDatabase, I just took one big nasty MockDatabase and broke it up into a lot of tiny MockDatabases, one for each class. Things feel better already. But there's a little feeling of looseness here, because I can't refactor this code as far as I'd like. Then it occurs to me that maybe I should've written UnitTests for the MockDatabase itself. -- fh

I just had a discussion the other day with JeromeFillon where he mentioned that he does indeed UnitTest his MockObjects. Jeff, I think that random MutationTesting would definitely be incomplete but maybe you could come up with a range of changes to your tested code that would satisfactorily "test the tests". However I have always been under the impression that UnitTests are the "safety-net" that allow me to ReFactor with courage. -- IainLowe


moved from BeCarefulWhenChangingTests

The interface described in the UnitTests is a detailed description of what someone expected of their objects.

It always feels good to delete large chunks of code. However, it's probably good to place an extra amount of effort into preserving all tests for a given bit of functionality until you're sure that functionality isn't used anywhere.


I'm trying to figure out a way to automate the process of highlighting callers, but I can't wrap my head around the problem. It's probably a feature supported in any RefactoringBrowser, but I want to combine it with tests.

For example, if I have the following test:

 apple = Apple.new
 assert_equal('tasty', apple.taste)

And I want to change apple.taste to be 'rotten', I need to be sure that no other code depends on the 'tasty' value. A simple heuristic would be to grep for "taste" throughout the entire codebase. (This will return FalsePositives, but that's Ok).

We can and have done this sort of thing manually for years. What are some ways to automate it, though?

One would be to add some sort of comment, like this:

 assert_equal('tasty', apple.taste) # grepFor "taste"

Then, one could have a script that parses the output of "cvs diff" for changes that affect lines with a "grepFor" comment.

If I get around to spiking this I'll report back.


If you can identify in advance that a line of code is likely to broken in future, there are probably better ways to isolate the risk. Have you grouped stuff by ChangeVelocity?


Merged from ExtremeProgrammingUnitTestCode

Should UnitTest source code be treated with the same importance as production code, or should it be treated as "second-class"? If not properly maintained, unit test code will degrade just as any code does. In that case, developers will pay the price in the long run and be forced to spend time on rewriting the test code.

Related Egroup threads:

http://groups.yahoo.com/group/extremeprogramming/message/15913
http://groups.yahoo.com/group/extremeprogramming/message/16463
http://groups.yahoo.com/group/extremeprogramming/message/18375


Convert Test Code Into Features

I have worked on a product which accepted up to 2000 dial-in ISDN connections and converted them to ethernet traffic. We didn't have 2000 ISDN routers to test the device with, but we did have a great big 'load tester'. This was a bundle of our own poorly maintained spaghetti test code. It ran on very similar hardware. Crucially it had dial-out functionality and could dial-out over 2000 ISDN connections. The test code was very ropey. Any time there was a problem of any kind everyone blamed the 'load tester'. No real progress was being made on eliminating the bugs. I was handed the project to 'maintain'.

The bug list (once I'd amalgamated and rationalised it) was just shy of a thousand items long, about a quarter of the bugs looked hard. I decided to tackle the test software first as top priority.

This work was by far the largest sustained refactoring of any code I have done in my life. It took around six months to complete (initial estimate was three months). The outcome - we refactored out nearly all of the test code so that it turned into integrated features of the code under test. Unusually senior management were right behind this decision and stuck with it even with the time overrun.

This may not be an option in other kinds of project. For this project it was a huge win. The product is stable. It was not stable before. Testing is much more rapid and is more thorough now than it had been before. The hardest bugs (which only showed under heavy load) have been cracked. The project has been handed back for onward development. There is even one customer who seems to want the dial-out functionality as a feature, not just as a test feature.

So from this ISDN war story I'd like to suggest that at least in some cases refactoring of test code can make it disappear altogether. One way to keep the volume of purely test code under control - turn it into features of the code under test. -- JamesCrook


See ResilienceVsAnticipation, ExtremeProgrammingImplementationIssues


CategoryRefactoring, CategoryTesting


EditText of this page (last edited November 25, 2004) or FindPage with title or text search