Almost Correct

I am a little concerned, actually, about the success of XP / Unit test disciplines, because there are some kinds of behavior which deserve to count as spectacular failures which are hard to test for this way, especially with systems that are large, distributed and stateful. One example would be a data management service that manipulates customer data (which is pretty complex and stateful) on behalf of a number of systems, or even within one large system. "Unit tests" on this service's exposed API, and / or internal methods need quite a lot of test harness & infrastructure to make them go. "Functional Tests" vs. this service implementing some kind of story or use case require even greater support.

Testing for correct operations of such a service requires at least these considerations for both methods and stories in addition to functional correctness:

Concurrency and performance.
"Edge cases" in the data domain.
Failure modes (internal to the service.)
Error handling (of errors in services or functions that the service depends on.)
Reduced functionality (as external services or functions are unavailable.)
Recoverability.

I'd like to see tests for such stuff automated, and required to pass as a matter of course much like XP style "unit tests." I'd like to see many functional (sometimes "acceptance") tests also automated, and required to pass. Yet the setup to test for any of these issues can be pretty involved, and spans technical disciplines well beyond the scope of things like J-Unit and build tools. So tests for these kinds of things often don't get done, even in otherwise disciplined XP shops & teams I have seen. I am commenting from direct observation of the practices of self-identified XP teams building real, production software in several large commercial IT environments. So this is neither theory nor speculation.

I am wondering if in systems that are some combination of large, distributed and stateful, the correctness that is testable with unit tests + functional tests doesn't count as it's own kind of AlmostCorrect. This question relates to AlistairCockburn's idea of different degrees of refinement of use cases. I am wondering if the testomg methods that will be sufficient (XP defined unit tests and functional tests are an example) won't vary by the degree of refinement in the requirements the system must meet.

Designing to escalate AlmostCorrect.

Here is a thought experiment about design principles that might provide some mitigation of AlmostCorrect:

Design so your current feature (class, method, etc.) fails spectacularly when the services it depends on are AlmostCorrect.
Design so your current feature (class, method, etc.) creates spectacularly incorrect results (fails completely is one example) when it fails.
Taken together these might mean: Make your system vulnerable to the AlmostCorrect.

The point is to make correct / incorrect binary by making it harder for AlmostCorrect to slide by. The old maxim is: "If architects built buildings the way programmers build software, the first woodpecker that came along would destroy civilization." I wonder about the maxim: "Build your software so the first woodpecker modified beam you use will take the whole thing down immediately." So, this design principle would escalate the AlmostCorrect to producing unacceptable problems, thus gets the AlmostCorrect fixed.

I'm not advocating making systems fragile. This thought experiment has me intrigued because it seems as if something organizationally useful happens if you do. "What if we made systems vulnerable to some kinds of AlmostCorrect behavior . . . "

Working with Weak Policy

The XP requirement for unit tests and 100% test success seems to be a policy to me. Any test failure, or failure to execute existing tests is declared unacceptable by the GoalDonor, or the GoldOwner or both. I have found the sticking point in implementing the practice of XP style unit tests (many testing practices, actually) is often the unwillingness to establish the policy, not the implementation details. So selecting automated, automatic unit tests plus XP style functional tests seems like three things to me:

A policy, that we're serious about the tests we've decided on.
An engineering assertion that tests in these categories are sufficient.
An engineering practice assertion that implementing these tests this way is the most effective, efficient, and productive.

The distinction between a policy problem and the two implementation problems can be seen by applying the "trained cockroach test." Ask the question: "If I could get 100% test coverage (unit and functional as defined by XP practice) with every build automatically, for free, using trained cockroaches could I still use the results?" Well, yeah, if the policy is in place. And no, if the policy isn't. That trained cockroaches are a poor implementation for most things, in this case either specifying test sets, or implementing & executing these tests is a different matter. It's useful to flush out that distinction.

So what do you do if that policy isn't in place from the GoalDonor or the GoldOwner? Designing to escalate AlmostCorrect would be another way to get a more robust form of correctness. Rather than escalate any test failure to unacceptable by policy, it escalates observable instances of AlmostCorrect to failures that are already unacceptable. This sounds really cynical. But isn't one aspect of "design for testability" to design so failures are visible, obvious, and unambiguous?

There's a variation on this question where the policy requiring all tests to pass all the time is owned by the folks producing the artifact vs. the GoalDonor, or the GoldOwner. Then the management of the production process estabilshes this policy or not. Another variation has the technical team assert this policy, independent of sponsorship of the people responsible for the performance of the development function, or the sponsors of the system.

It seems to me that if all testing all the time works in terms of the payoff that accrues to either development as a function, or the customers of development, the case ought to be pretty easy to make. What's going on that the case isn't so easy to make?

Clearly, "Several kinds of AlmostCorrect" and "Designing to escalate AlmostCorrect" don't work together. That's what got me typing. They both make sense to me in some circumstances. They seem to pivot on how AlmostCorrect gets treated as acceptable. But I can't reconcile them.

- JamesBullock