Intentional Redundancy Does Not Violate Once And Only Once

OnceAndOnlyOnce is a well-established principle; as redundancy (in particular, inconsistent redundancy) is a common source of program errors. Eliminating redundancy speeds up ReFactoring as well, as you only have to change 1 thing rather than changing n things.

However, redundancy is frequently introduced into systems intentionally, for several reasons. These reasons include increased performance, and (ironically, it appears) increased reliability.

Increased reliability

Examples of this are FunctionPrototype?s in C/C++, various disk-mirroring technologies, DeoxyriboNucleicAcid in nature, etc. How does this work? Two reasons:

It is possible, in all such cases, to tell that the two (or more) copies of the data are inconsistent with each other. This allows controlled failure/recovery to occur (a disk shutdown and restore from backup, a compiler error rather than a runtime crash, etc.), which is always preferable to blindly continuing on with hosed data.
- If there are more than two such copies, a "voting" system can be used to recover immediately--the copy that disagrees is considered bad, the other two are considered good.
In many (not all) cases, it is possible to tell if a single copy is corrupt (without comparing it to other cases). This is known as an erasure in CodingTheory?. If one disk in a mirrored disk array fails; the disk controller will know it and can use the other mirror exclusively until the failing disk is replaced.

In typical violations of OnceAndOnlyOnce, OTOH, the inconsistency is never detected. Some parts of the system use one definition, other parts use the other.

However, see AirplaneRule; the mechanisms necessary to harness redundancy for the purposes of increasing reliability greatly complicate things; it's not a free lunch. That said, redundant systems when designed properly can indeed be more reliable than non-redundant systems; especially when given inputs that the system isn't designed for (such as a bird through the turbines or a bullet through one of the disks).

Performance

Denormalization of a RelationalDatabase is one example. Caching and LazyEvaluation are others. Here, a redundant copy is made (and in many cases, maintained in parallel if the data is mutable) in order to improve performance (due to increased locality of reference, etc.) Managing this redundancy can be tricky, and isn't a task for the faint of heart. However, like the example above, the redundancy is managed.

DocumentationOfIntent?

Data is provided in redundant fashion to allow some system (or a human) to perform consistency checks; failure may be indicated by an inconsistency. Unlike the IncreasedReliability? case above (where data is automatically replicated in order to ensure that subsequent corruption or erasure doesn't occur without being detected), in this case the data is supplied to the system in redundant form, often by a human who must specify something more than once. Often times, only one "copy" of the data is a full set (error correction isn't possible); the redundancy is only used to verify the full set of data. Examples of this are widespread in programming, such as:

UnitTests. Code is provided to implement some function; other code is provided to verify that the function is implemented correctly.
Assertions/DesignByContract. Similar to above, but executed by the compiler or the language runtime rather then a separate testing framework.
Type annotations, redundant declarations (including of things which could be determined automatically). Also used to clarify the user intent.
ProofCarryingCode.

In many cases, the redundancy (in addition to being machine-verifiable) also serves as documentation. Documentation itself can be considered a type of redundancy that isn't machine-verifiable; comments and design docs aren't consumed by the machine, but instead by the human programmer. Alarm bells should ring if the code does something different than what the comments describe. (Unfortunately, it is easy for comments and design notes to get out of sync with the code; as the verification steps for these--humans inspecting the code--are tedious and error-prone).

In the above discussion, I believe C/C++ headers should be moved under the "Performance" category rather than the "Increased Reliability" category.

In an ideal world, yes. Given that the design of C/C++ is unlikely to change in this manner, header files are needed for reliability

The reason header files were added when we went from K&R C to ANSI C was to keep the include files small and to avoid writing an additional preprocessor. There is no technical reason why function prototypes could not be pulled from the source code (other languages do this). At the time, however, I assume it was felt that it was better to have the developer manually extract function prototypes rather than having a preprocessor or compiler do it for him.

Please note that I am a C/C++ die hard, but I recognize that using the technology and knowledge we have today, it should no longer be necessary to have the developers create separate header files to allow the compiler and linker to perform consistency checks. It would also be nice if someone stepped forward and defined a way for separately compiled libraries to contain the prototype information (at the binary level rather than at the source code level). Having a header file for libraries is an okay solution for C/C++, but it does not help in exposing C/C++ libraries to other languages.

I'm currently refactoring a hunk of C++ code, I don't mind C++ too much, but it is tedious to have to say or change the same thing in at least two places. I support the above statement for that reason.

On the other hand, the separate include files allows for creating DynamicLibraries? that have circular references to each other (provided that the linker and loader for the OS in question is smart enough to handle that case, most are).

The most important reason I've ever had to be intentionally redundant is clarity. A few times in UnitTests, I've refactored away common code, only to revisit the tests later and discover that their workings were less immediately obvious than they were with the redundant code left in. Sometimes I find I can solve the problem with better names, but in many such cases, I've re-refactored them to re-introduce the redundant code. -- PhilGroce

Introducing redundancy can be a healthy refactoring if it reduces coupling. Consider two modules (which you control) X and Y, both of which use local functions fx and fy, respectively. fx and fy are discovered to do the same thing (perhaps with some trivial differences that can easily be ReFactored). fx and fy don't have any dependency on the rest of X or Y (or such is easily factored out). Should fx and fy be moved to a common library and renamed f?

A quick test; count one point for each "yes":

Is f a large function?
Does f perform some independently useful function, such that it is likely a third library might want to use it?
Is X dependent on Y, or vice versa?
Is there some other common library Z on which X and Y depend, in which f would "make sense"?

If your total score is nonzero (and certainly if it is two or greater), ReFactoring is probably an improvement. If the total score remains zero, you might be better off leaving well enough alone. Otherwise, you will need to introduce a new dependency in both X and Y, to contain a minor function which isn't generally useful; this is likely to increase the complexity of your project.

This seems similar to ThreeStrikesAndYouRefactor.