Incoherent Reward Structures

Some organizations reward individuals for side-effects of failure, not for success.

Ron Jeffries wrote on the XpMailingList:

Do metrics usually have numbers that indicate trouble? SLOC? Defect Count? Or is it always the trends one watches? I guess that sometimes there could be trouble bars.

With RTS, more is better. With SLOC, more is /assumed incorrectly/ to be better. With Defect Count, I would guess offhand that more is worse.

It depends on how you do work.

In a previous job, I was doing TDD on a solo project that was being run in a waterfall manner. The tester assigned to test my work got seriously annoyed with me: because I had produced a high quality, thoroughly tested product, he wasn't finding defects (I think he ended up finding 3 all up, two of which were actually requirement defects).

This made his metrics look bad. You see, from the point of view of his manager, he couldn't distinguish between a developer doing his job right (not introducing defects in the first place), and the tester doing his job badly (not finding defects that are present). My corresponding performance metric (Number of Defects outstanding) wasn't impacted because it was expressed as a ratio to the number reported.

I ended up telling him that was his problem, not mine. :)

-- RobertWatkins

Watkins could, instead, have conditionally compiled a "BUGGY" version of the code, containing a SpikedSample of 4 or 5 cleverly hidden bugs. If the tester returns 3 bugs, that metric needs work!

This is DefectSeeding, a process used to assess and evaluate proof-readers and copy-editors. Seed the text with some known number of defects in each class of defect (letter-transposition, doubled words, missing letters, missing words, etc.) and see how many are caught. Use the percentage caught as an estimator for the number of real defects remaining given the number of real defects caught. Remember to remove the ones you planted ...

Notice I suggested ConditionalCompilation? to remove the bug. Even temporarily adding a bug to the ProductionBuild? would add risk.

Please note that until one has a defined system of measurement where a consistent result is returned despite who or when the measurement is taken, evaluation of the result is meaningless. If the result is inconsistent, one would show "improvement" or "degradation" by repeating the measurement without any underlying changes.

Comparing two different measurements, as alluded to in the defect seeding discussion above, is even more problematic. Any competent programmer (any many incompetent programmers as well!!) can insert 1 error into the source code and have the compiler report hundreds of errors. A simple example, but this highlights the way two different, but both "correct" measures can report widely different results. Often a similar problem exists between testers and programmers in reporting error counts, with developers often tying many "tester errors" to a single "developer error".

For those who would like a more detailed discussion of measurements, measurement systems, and analysis of measurements, please refer to Out of the Crisis by WilliamEdwardsDeming. -- WayneMack