[Also known as... DefectSeeding.]
We all know that we should write tests to automate the checking of the code we write, and some of us even do it. But how do we know the tests are correct? How do we know they would actually catch bugs? How do we get a feel for the percentage of bugs still in the code? And how do we overcome the nagging doubt that having the same person write the test and then code the module is not orthogonal?
A possible approach is to appoint a ProjectSaboteur. Their job is to check out a parallel copy of the source tree, and then introduce bugs on purpose. They deliberately hack the module interfaces: ask for 10 values, for instance, and a hacked module will return 11, or none, or three million. In theory, every bug they introduce should be caught by an existing test. If not, you've found a hole. Categorize the holes, and you may find systematic weaknesses in the tests. Count the holes, as a ratio of overall bugs introduced, and you have a very rough idea of the percentage of bugs your tests are finding.
As a side effect, being a ProjectSaboteur is a good way for a newcomer to gain insight into the structure of a system without doing permanent damage. In a PairProgramming environment, ProjectSaboteur might be a good role for an odd person out.
Playing ProjectSaboteur consciously will also give you a very good feeling of where the code has problems, namely where it seems easy to insert a bug that will slip through. In my experience, LongMethods? are the most obvious candidates, which might be related to the fact that experienced programmers seem to PreferShortMethods? more than novice ones. YMMV, of course. Looking for a good place to seed a bug is very similar to looking for CodeSmells.
Presumably one does not commit the bugs, but instead runs the tests on a modified working copy? Or at the very least, commits to a branch.
I have seen this proposed before, but I find it highly questionable due to the following.
- What action is to be done? This appears to be an exercise in collecting numbers for the sake of collecting numbers. What is to be done if the number is not to one's liking? Please note the last is a non-trivial question.
- How does one know the seeded errors reflect the actual error types in distributions in the code (this is the concept of statistical populations)? As an extreme example, if one only seeds typographic errors, one can expect 100% detection by the compiler and this will not provide any information on the latent fault level in the system. A slightly less obvious example is that if one seeds with an error type that is not normally committed by the programmers, the error detection rate will not reflect the actual error rate.
- Related to the above, it is highly unlikely for the seeded errors to reflect unexpected error conditions. The seeder will likely insert errors based on his expectation of errors; errors he was unlikely to have committed in the first place because he was aware of them. Similarly, the developers would come to expect certain types of errors and would cease to commit them. If one believes that most errors made are of the unexpected variety, it is unlikely the error seeding would uncover any of them.
- Does the error rate within the program reflect the error rate during execution? Not all errors are created equal. A single error in a common operation will affect a user much more than dozens of errors in extraneous operations. Assuming a problematic program, error seeding does nothing to focus the developers on critical areas; rather it would tend to have them address all areas of the code base equally.
- Are don't care conditions adequately reflected? There are many cases where one simply does not care that an error condition may or may not be detected. Once one has passed data validation and guard checks, it is simply pointless to repeat those checks at lower levels. From the example above, what information is provided if an internal routine that is supposed to return 10 items returns 11 instead? Why would one want to spend run time resources validating this operation if the condition is inherently prevented from occurring?
For the above reasons, I would consider defect seeding as a waste of time and effort providing little or no tangible benefit. Without a purpose and understanding of actual error types and densities, there is simply no justification to doing this type of "testing."
--WayneMack
This is like saying "without any plan how to test it is a waste of time provides little benefit" (I exaggerate). Finding incomplete tests is a good thing. As is testing in general. One can overdo it, yes, but you can overdo testing too: Just test every possible combination you can think of.
See ProjectSabotagePatterns