The hardware world underwent a revolution with Shewhart and WilliamEdwardsDeming's process quality control practices.
Briefly, here's one common practice they debunked. A tool operator produces parts, and drops them onto a conveyor belt. Some way down the belt, a "quality control" operator picks up each part, measures it, and tosses the ones that are out of tolerance.
In this model, managers expected to get higher yield by ordering the tool operator to go faster, and by hiring more QC operators to throw away the increased defect rate. The tool is the expensive part, so making it go faster returns value on its investment.
This practice has a parallel in some modern software shops (such as Microsoft's). A programmer writes code, and a test engineer (in another office) picks up each integration and writes unit tests for it.
This practice creates errors under the assumption they must be created, so they must be removed downstream. It over produces, and then censors out the bad parts.
Shewhart and Deming changed this practice, for hardware. They ignored errors and focused on variation. They separated variation into two classes: assignable cause and common cause and created a tool (the control chart) to help differentiate between the two. Assignable cause variation indicates that an underlying cause for this variation can be found. Common cause variation indicates that the variation is due to normal system operation and no unique cause can be identified. Assignable cause variation is to be addressed by identifying the cause of the variation and, if necessary, taking steps to prevent its recurrence. Common cause variation can only be reduced through experimentation. Measure the current level, try a different approach, measure the new results and if they are an improvement, implement the new approach. There are equations that can be used to determine if it is cost effective to either do 100% inspection of the result or 0% inspection of the result. Essentially, if the cost of inspecting everything exceeds the cost of handling actual problems after the fact, eliminate inspection.
The point is to focus on constant, measurable improvement and letting errors take care of themselves.
In the new model, tool operators themselves learn to calculate a few statistical formulas, and they themselves rate the odds that their own tool will yield product out of spec. Then, when those odds are safely under some threshold, say 1 in ten million, the company can safely take the QC people off the conveyor belt. The entire process now operates with much less industrial waste.
This is the QualityIsCheaper principle.
Because it also employs fewer statisticians, they fought it ;-)
Programming is a design activity. Our hard drives themselves take care of the statistical aspects of production for us, in their own way. But these parallels between waste in production efforts, and waste in design efforts, are compelling.
They indicate we should seek ways to waste less. In a design effort, chronic bad design is waste. The broken output is FireFighting, long bug hunts, ProgrammingInTheDebugger, and hours beyond a FortyHourWeek. So we try to use TestDrivenDevelopment to literally stop the bugs before they start.
But in a design space, we can't use statistics, because each element of product should be measurably different from each other. So we tune our production tools using another entire human being: PairProgramming.
ThankYou PhlIp, I hadn't spotted this link. -- MatthewAstley
Actually, in a design space, we need to use statistics. Some ways of coding things are more error prone or more likely to cause errors when modified. Hence the need for a CodingStandard. If nothing were repeatable, experience would be meaningless.
Natch. But attempts to pencil them on graph paper, hang them next to the "tool", and calculate their StandardDeviation would miss the point and risk measuring the wrong things. YouGetWhatYouMeasure?. --Phleep
Without some sort of measurement how do we 1) ensure a change actually makes an improvement, and 2) move CodingStandards beyond being the opinion of the loudest debater? Sans measurement, how should we decide?
I do not understand the metaphor. Programming is not the production of lots of identical parts, is it? In programming, each "part" is different. So there is no comparison with conveyor-belt production processes. I mean, do those large companies throw away any failed code? I don't think so. It is probably repaired by the original programmer or by his/her colleague. I'm a fan of TestFirstDesign, but mainly because of reusability and tracability of errors. --WillemBogaerts
If every program is completely unique, then what part does experience play? How can we have any standard ways of doing things (such as TestFirstDesign)? These things imply an underlying commonality of process (not the product being produced). If there is a commonality, it can be measured. Note even the title of this page implies a measurement. I don't believe the question should be "Can we measure software development characteristics?" but rather "What characteristics?" and "How?"
See: QualityIsFree