Xp Vs Standard Definition Of Unit Test

Is the ExtremeProgramming definition of UnitTests compatible with the StandardDefinitionOfUnitTest?

Failure of a UnitTest implicates only one unit. Failure of a DeveloperTest implicates the most recent edit, following the TestDrivenDevelopment rules.

UnitTests are expensive, and DeveloperTests are cheap & easy to write, and they lead towards UnitTests.

DeveloperTests fail early, loudly, expressively, and reversibly. They often fail accurately, but if they don't, you don't care. Using Undo to back out from the failure of a HyperactiveTest? is often cheaper than tuning up the test.


Point 0 of StandardDefinitionOfUnitTest says that when you're dealing with object-oriented software, the units you're testing are classes. The StarUnit tools and most XP discussion goes along with this.

Until ExtractClassRefactor? produces a new class with only indirect tests...


From the StandardDefinitionOfUnitTest...

  1. UnitTesting (as defined in the software testing literature) requires that we test the unit in isolation. That is, we want to be able to say, to a very high degree of confidence, that any actual results obtained from the execution of test cases are purely the result of the unit under test. The introduction of other units may color our results.

  2. Since we can seldom, if ever, test a unit without the presence of at least some other software, unit testing involves such things as "drivers" and "stubs."

Response: ExtremeProgramming does not require UnitTests to test a class in isolation from all other classes; YouArentGonnaNeedIt. But XP uses stubs to isolate classes being tested whenever the team finds it worthwhile to do so.

Chapter 13 of the ExtremeProgrammingInstalled book, titled "UnitTests," describes the creation and testing of a "Scrutinizer" class that will hold, filter and return collections of TaxReturn objects. The TaxReturn class hadn't been written yet, so all testing was done using "FakeTaxReturn" stub objects.

If you felt the need, you could use StubObjects in place of classes that are implemented. To quote from the book: "[...] with more complex collaborating objects than the Scrutinizer, you can get strange interactions in your testing. We often find that using stub objects instead of the real thing will simplify the testing and help you go faster."

So the practice is encouraged but not required. If you need isolation, do it; otherwise remember that YouArentGonnaNeedIt.

Bedard's definition of UnitTests being tests that rely on no other stuff in the system is fatuous, since no such test is possible.


From the StandardDefinitionOfUnitTest...

So, the use of other units in the testing of a unit creates two sets of problems:

  1. The other units may color the results of a test case, and lower our level of confidence in the unit under test.

  2. The other units may interfere with the straightforward execution of one or more test cases.

Use of multiple units (classes) in one test may "color the results" to the extent that tests fail on many classes at once. But you know that the problem was caused by what you did since you last ran the UnitTests successfully. And if you're using ExtremeProgramming practices, you wouldn't have done much.

As a RegressionTest, this use of UnitTesting isn't interesting. Moreover, it's not the UnitTest that determines whether you failed, but microiterations. Microiterations limit the number of things that changed to a small set that can be manually examined. This is close to (but weaker than) the ScientificMethod. The ScientificMethod works because it is theoretically stringent, ala InformationTheory. Anyway, the point is that UnitTests only tell you that something is wrong in the area they test. However, it's how often you run the tests (or more importantly, the system) that ensure that you don't lose control of the system.

[This is also discussed in UnitTestIsolation.]


Point 3 of StandardDefinitionOfUnitTest deals with black-box vs. white-box testing. Black-box testing, which is based on the specification for the unit, is present in XP if you CodeUnitTestFirst. White-box testing, which tests the code, happens if you add more tests after writing the class/method/function, or if you add a test to cover a bug that slipped through.


First question, does XP use UnitTests like I've defined them or not?

Unit tests as you defined them are a subset of the XP definition, and many people do use them the way you defined. Using a MockObject as a class stub for example, and doing white box testing. There was a huge discussion on the XP mailing list a while ago about whether you should test private members, and the consensus was that XP doesn't say either way and that people had their own personal preferences. Why is the fact that the XP definition of UnitTests is less strict than yours such an issue? -- DaveKirby

It's an issue because if I claim I don't use UnitTests, then I don't mean I don't use XP-style UnitTests. Basically, I'm totally confused as to what XP means by UnitTesting, hence the prompting above for more information. But consider this, despite the dozens or more times I've UnitTested? to great disaster (often to the point where all testing got thrown out as a result), XP people are the first people I've ever seen that have become "TestInfected." So, I can turn the question around: what makes you guys so keen on what everyone (for some definition of everyone) else hates with a passion? -- SunirShah

[...]

Actually, I rarely care what the testing methods I use are called. I just do them. More carefully stated then, I do UnitTest, but not as my mainstay. There are other ways that I've found are useful. -- ss


Now that we understand what each other does, I want to attempt to define XP unit and functional tests, according to my poor understanding of XP. The standard disclaimer applies: I don't even speak for myself, much less XP.

The distinction between an acceptance (nee "functional") test and a unit test is not what they test or how they're done, but in who they satisfy. The acceptance test makes the customer satisfied that the software provides the business value that makes them willing to pay for it. The unit test makes the programmer satisfied that the software does what the programmer thinks it does.

That's the essence of it. There are a whole lot of traditions for these tests, which I think people get mixed up with the actual definitions. These traditions are a result of the forces acting upon you when you do testing, but are not essential to the definition of the tests. For example:

I think many people, even within XP, confuse these traditions with the actual definition of the tests. So one might say, for example, "A unit test is testing an individual class in isolation," or "An acceptance test tests the entire program." That would be wrong. They are not the definitions, just traditions resulting from the forces that act upon you when you are doing testing. There is absolutely nothing wrong with the programmer writing a so-called unit test that tests the whole program, or with the customer defining a functional test that stubs out part of the system. It's up to the people doing it to weigh the costs and benefits, not to follow some arbitrary definition.

Well, that's how I see it. I hope that someone who actually knows something about XP can correct whatever gross misunderstandings of it that I have. -- WayneConrad

Wayne, I'm not volunteering myself as an XP expert, but I'm pretty familiar with it now. You are, IMHO, pretty much on the money. Especially with the "feedback, feedback, feedback" and the para. that precedes it. You sound pretty "extreme" there. That's not so surprising. Every good developer I've spoken too about it has found a lot of what they already do in XP (and a lot of bad ones think there's a lot of what they do in XP, which can cause...confusion).

The thing with XP unit tests is that you write them first. This makes them a design tool, not a testing one. I think it was Ward that first made that distinction explicit, somewhere. This difference in emphasis seems to be where certain testing experts elsewhere get lost when trying to critique XP. I suspect it's also partly behind the mess up at the top of the page. (I don't see how one can discuss the things XP calls "unit tests" apart from the practices surrounding them.)

The acceptance tests are tests, the unit tests turn into a regression testing suite, but aren't written as tests. They are written as executable design documents. They are your throw, the code written afterwards is the watch.

I hope that's helpful. -- KeithBraithwaite

Great stuff, Wayne and Keith. I also don't know if it is XP, but this is very much my view. The point that tests are a design driver is on the mark. No wonder "testers" don't understand why XPers like tests. And why we don't focus on coverage and other normal testing attributes. The intent makes the difference. It goes to explaining why we are happy to create the tests that Sunir sees as such a chore. -- LeoScott

I think the best description of the test first approach is to apply the standard debug process to new design. When doing debugging, you create some sort of test which recreates the problem, then you go in and modify code until your test passes. Anyone who has done a fair amount of debugging will feel very comfortable with this approach.

Unfortunately, I feel the preceding discussion has only added to my confusion over the difference between UnitTests and AcceptanceTests. If UnitTest does not imply testing at a class level, then the term is only adding to confusion. I have adopted the test first approach, but I will usually write a test description for individual executable modules (EXEs, DLLs, ASPs, etc). Often times, these are little more than a set of steps to perform from the application GUI and a description of the expected result (i.e. purely manual). Does this fit with the XP definition of UnitTest? If so, I would suggest changing the name to something like module test (I would really like FunctionalTest, but that name is sort of taken in the XP world). -- WayneMack

See RenamingUnitTests.


The above reminds me of something. On the rare occasion I actually get to program in Smalltalk, I often write a list of small tests when I code things because it's simply more efficient to do that than step through the code in the idiotic debugger. For those who don't know, all methods are public in Smalltalk, and non-declarative Smalltalk is an online system so it's easy to whack stuff in and then Do It. In languages as anal as Java and C++, I don't bother because it's too slow (I get bored and frustrated). I occasionally do this in Perl, but my development practices with Perl are bizarre given what I do with it (my development environment is several time zones away from the runtime environment). Anyway, the point is that another force on whether to choose a practice is how expensive it is given your development environment's support. This is important.

(But that's not to say that one should write an RUP or XP or whatever development environment. That would be a disaster. Tools should enable, not lock the users into the developers' world view--just look at how annoying VisualAge's CodeOwnership repository system is to people who disagree with ObjectTechnologyInternational's "every line of code must be owned by somebody" mantra.) -- SunirShah


Summarizing...

  1. XP makes certain assertions about the effect of doing UnitTests the way they teach.

  2. Using a different (more standard) definition of UnitTests, the XP assertions are not entirely borne out.

[This summary shows why the StandardDefinitionOfUnitTest does not disprove the ExtremeProgramming claims: The XP definition of UnitTests is different from the StandardDefinitionOfUnitTest. So, if you change this core practice of XP, then you're not doing XP, and your results don't necessarily describe XP.]

It's pretty damned confusing to use terms that are understood to be something different elsewhere. XP could use some refactoring itself. Use MeaningfulNames, dammit.


The point wasn't that XP was bad, but XP comes with mantras that are difficult to unravel. I'm not attacking XP, but UnitTests (which I'm not attacking, really; UnitTests are good, just not always). XP isn't important, but your experiences while practicing XP might be...

I don't see why it's impossible to discuss UnitTests without saying, "They're good because XP said so." [...] I will listen to stuff you have to say, just as long as it's clear what you mean by it.

First question, does XP use UnitTests like I've defined them or not? -- SunirShah

No, it doesn't.

(Which does not, by itself mean that the XP definition of UnitTests or the StandardDefinitionOfUnitTest is superior or would produce better results in practice.) -- JeffGrigg

No, it is to say that the standard use of the term is worse. That's the position I'm trying to defend, which is why I asked for an XpFreeZone. Not that XP was bad, but because it wasn't relevant. -- SunirShah


XP uses UnitTests with the intent of reaping the benefits you refer to and attempt to refute above. Since in fact many XPers are in their own estimation reaping the benefits, your refutation is either erroneous or based on what I might call technicalities, like your assertion that it isn't the UnitTests that keep you focused but "you" that keep you focused. Of course it is "you" that keeps you focused but we find it easier not to go down the "improving this object to oblivion" rat hole when we focus on just testing whether it does what we need. When the test that says it does what we need runs, that's a solid signal that maybe we should stop. Perhaps some folks are so connected that they don't need cues. Me, I'll take all the cues and clues I can get.

In sum, and sincerely with all due respect, I think your remarks remain off the mark. There is something happening with the way XP does the thing it calls UnitTests. We may not express it perfectly yet, and we may not use the term the way the terminally retentive would, but there is something going on. I get that you haven't had that thing go on for you, but IMO that's because you have resisted letting it happen. I sincerely believe there is some kind of ShuHaRi thing going on in the XP practices. You have to start at Shu, unless there is some other path to enlightenment.

I think this would be best served if I could observe how you tested. Perhaps I've been steeped too much in academic software engineering to see what you really mean. I guess that's why I'm anal about the definition. -- SunirShah


Hopefully, every statement in the code is essential to some function or attribute of the unit, or else that statement shouldn't be there. One approach is to say "I know what this thing is supposed to do, now let's see what might go wrong" and write tests based on that. That's the functional approach, exploiting knowledge of the interface. Another approach is to look at the code and muse "I wonder if this particular statement has a problem", then try to figure out how to exercise the unit so that statement will execute. Either is valid unit testing. Both involve bad behavior and lines of code. The main thing about unit testing, to separate it from other kinds of testing, is that it seeks to work on very small pieces. When you try to define "how small", you get into a bit of a mess, but the idea is to avoid the complexity of integrated stuff and work with what's small and simple. Could be just a method, not a whole class, too. -- WaldenMathews


Moved from UnitTest

What strikes me odd in all these pages in Xp about UnitTests is that they always stress "AS MUCH TEST AS POSSIBLE" and say that you should test every line of code. To me this seems to skip two much more important tests:

I'll try to illustrate with a simple example. If my example is not sufficient you're welcome to change it with another simple example (I've got the flu at the moment so my thinking capabilities aren't at the max).

Given the following definition of a method (in a java class which I'm not feeling like writing at the moment)

 /**
  * This function returns the remainder of (num, div)
  * @throws IllegalDividendException? if div < 0
  * Returns -1 if the num < 0 otherwise returns the remainder 
  */
  public static int remainder (int num, int div)
  {
    if (div < 0)
    {
      throw new IllegalDividendException?();
    }
    if (num == 0)
    {
      return 0;
    }
    if (num < 0)
    {
      return -1;
    }
    return num % div;
  }
The tests for BlackBoxTesting would be:

 check when div < 0
 check when num < 0
 check normal status
Where check means: The WhiteBoxTesting would test the same (in this case) plus it would also:

 check when num == 0
-- ChristophePoucet?

I think the black box/white box distinction is solving the wrong problem. I don't create UnitTests to because automation is a cheaper way to complete BlackBoxTesting or WhiteBoxTesting; I create UnitTests to produce working software.

That said, a truly BlackBoxTesting UnitTest would be a rare beast. Too expensive to create, too expensive to run. Any UnitTest is testing the behavior of the unit in a particular context, but only by knowing the implementation can you eliminate parts of the context as irrelevant, replace other parts of the context with MockObjects, and so on.

I wouldn't create a UnitTest unless I expected it to generate profit (value in excess of development and running costs) - the value is a totally subjective thing, and does include "confidence that I do know what is going on". If delivering the behavior described in the example above, I probably wouldn't bother with a num == 0 check, because it doesn't affect my confidence. But if I came across this code during a RefactoringExpedition?, I would add the num == 0 check before trying to remove that branch, because I know that I'm tinkering with that part of the behavior, which I want to preserve. -- DanilSuits

I think the above is missing the point of regression tests for refactoring. When refactoring, I want to assume that every significant condition is already contained within the tests, I don't want to have to search through the tests to find missing conditions I might break. I should be able to do most, if not all, refactoring without looking at the test code. -- WayneMack

Hmm. There's obviously an assumption you are making that I am not, or vice versa. Care to look for it? It may simply be this: when I'm about to refactor, and am worried about num == 0, I add the test. This may result in more than one test that checks that behavior, which is fine - test duplication can be cleaned up when it the drag coefficient gets out of control. Or you can remove the new test when the refactoring is done - reasonable if the implementation no longer treats that circumstance as a special case.

Alternatively, if you want to assume that everything significant is covered, then do that. If you manage to pass all the tests and break something, you'll add a new test when you fix it.

Adding a bunch of UnitTests because we suspect that at some time in the future somebody might want to refactor this section, and that they might make a mistake that no other test catches, strikes me as a bad parley.

Remember the main purpose of ProgrammerTests (nee UnitTests) is TestFirstDesign. The tests are written prior to the code and define the code to be written. "num == 0" indicates there is a branch which means at least 2 tests have been written; one that covers the path that fails the test and one that covers the path that passes the test; combined the two tests verify the "num == 0". The tests were not written because the code may be refactored, the tests were written to define the code to be developed. Additionally, if one practices DoTheSimplestThingThatCouldPossiblyWork, refactoring will be required. The cost benefit analysis must consider all three of these elements together and consider they permit one to avoid the costs of BigDesignUpfront?. When considered as a whole, a strong economic case for these practices can be made. Adding any of the elements in isolation to a waterfall development methodology will not reveal any significant cost savings. --WayneMack


See Also: IfXpIsntWorkingYoureNotDoingXp, TestEverythingThatCouldPossiblyBreak, RenamingUnitTests


CategoryExtremeProgramming CategoryComparisons CategoryTesting


EditText of this page (last edited June 28, 2005) or FindPage with title or text search