Over Design

When the BrooklynBridge? was built, a large percentage of bridges would fall within ten years of construction. So Roebling made his bridge TEN TIMES as strong as he calculated it needed to be, and, of course, it is still there, and not even sold. (But a pedestrian was killed a few years ago when a support cable snapped.) So, if you don't know what you are doing, overdesign.

OverDesign sounds good, but I still haven't figured what it means in software. Can you give me some idea? Thanks. --AlistairCockburn

I think OverDesign is perhaps the wrong term. It's conventional design, but exaggerating the various physical limits, designing in redundancy, and so on. So it wasn't the design process that was overdone, it was the parameters to which the design was applied.

Perhaps the software analogs are assertions (checking for things that can't happen), redundant systems (for when something breaks), dynamically sizing buffers (so surprisingly large amounts of data don't getcha) and the like. --DavidThomas

I know how to OverDesign for performance - have done that. Don't know how to OverDesign for correctness... "so it won't fall down." --Alistair

OverDesign for correctness is easy. Make the code twice as simple as it needs to be. Never assume the impossible can't happen, or, at the least, assert that it didn't. Never rely on things you have no control over ("the OS is supposed to do this...") or things you have observed but don't understand ("running it like this seems to work..."). Avoid clever tricks like the plague. As experts have said "Debugging is twice as hard as writing it in the first place. So, if you write code that requires all of your intelligence, by definition, you're not smart enough to debug it"

Well, the I believe the redundant voting microprocessors in the space shuttle and on the Airbus are independently coded. They aren't just there for hardware failures--if one version of a module produces bad answers, it's shunned by the others. Then there are the formal transformational techniques that produce provably correct code. They'd be considered OverDesign in most real-life environments. --Dave

Makes sense. Logic is the material of our discipline. Redundancy can be seen as multiplying the effective strength.

The conceptual problem with OverDesign for correctness is that the conventional understanding of correctness is fairly binary, making it hard for something to be more correct. If we want to have OverDesign for correctness mean something, we need to extend our normal understanding of correctness. For an example, think about what it would mean to OverDesign a program using an known-to-be-correct underlying algorithm, like calculating a factorial recursively. I would suggest that OverDesign would mean things like handling questionable inputs in a reasonable way, not blowing up when the machine's word size is exceeded, checking the result to make sure it wasn't damaged by passing alpha particles, etc. In my view, this is moving toward what I would think of as robustness, and I think OverDesign for robustness makes sense as a concept. MatthewWilbert

Although correctness may be an absolute, the chance of a program being correct is a probability. Designing for correctness therefore means maximizing that probability. -- Dave

Engineers talk of a SafetyFactor? or (more accurately) an IgnoranceFactor?. For the BrooklynBridge?, that would be a factor of 10. Since this is engineering, the factor is quantified. Roebling probably considered using a factor of 20 and rejected it on cost grounds.

This is standard operation procedure in engineering. There is a cycle. When people discover a new bridge-building technique, they know they have a lot of ignorance so they use a big IgnoranceFactor?. Time passes. The bridges don't fall down. This is taken as validating the new theory. Therefore the IgnoranceFactor? can be reduced, which it promptly is for cost reasons. Eventually the bridges start falling down again, and the IgnoranceFactor? goes back up. I gather the cycle takes around 100 years.

Use of an IgnoranceFactor? conceals errors in the theory. Some of Galileo's classic work on the strength of materials had an error that went undiscovered for ages because of this. -- DaveHarris

If you're going to apply IgnoranceFactor? to software, you have to think about where the ignorance is coming from. I should think it matters more for published, reusable software where you don't know much about the environment in which a component will be reused. Performance is one place where it can make sense. By "performance" I mean (eg) using O(log N) instead of O(N) algorithms, and I say "make sense" because I am often surprised at how heavily a given component gets used.

I should think it matters less where XP applies because XP deals with ignorance in a different way (ie through YouArentGonnaNeedIt; postponing decisions until the ignorance has gone). -- DaveHarris

There's a fundamental distinction between continuous systems and discrete ones. With continuous systems, like bridges or analog circuits, throwing in an IgnoranceFactor? is relatively cheap and is in fact necessary because the properties of the physical objects that we build with are only characterized approximately, and the environment in which the system must function is not known precisely. But because of continuity and the IgnoranceFactor?, these uncertainties don't (usually) push the systems into regions where they fail. With discrete systems, flipping a single bit can drastically affect behavior, and hence robustness cannot be achieved by simple IgnoranceFactors?. Techniques for achieving robustness in the discrete case tend to be expensive. -- DavidLong

"Then there are the formal transformational techniques that produce provably correct code" -- The best you can hope for is to prove that two (or more) expressions of logic are equivalent. There's no magic that makes specifications immune from human error. TestFirstProgramming is another way of demonstrating that two expressions of logic are equivalent. So are VotingProcessors?. So are CodeReviews and PairProgramming (but these aren't automatable).