Duplicated Code

Perhaps this is already covered by CopyAndPasteProgramming and CodeSmells? DeletionCandidate

I don't think so. CopyAndPasteProgramming is the process, that often leads to DuplicatedCode, but

which means that these two are often related, but none is a special case of the other.

What constitutes DuplicatedCode?

Code is duplicated if more succinct code exists that describes the same functionality (gives same desired(excluding noise) results after running). --AlekseyPavlichenko

Analogy with Zip data compression:

Refactoring is deduplication of source code done by human. Zip compression is deduplication of data done by machine.

Compression (deduplication) of one unit of information is many times more CPU-expensive than decompression (duplication). Similarly deduplication of one unit of functionality of software code is many times more human-expensive than duplication.

Consequently if a company spent X dollars to develop a product with duplicated code it will have to spend X * K dollars on code refactoring without any single visible improvement of the product, where K is between 3 and 100 depending on skills, complexity, language, etc.

Some companies hire more and more developers to pay interest on that duplication debt or keep changing developers wondering why production emergencies don’t disappear even with best minds on it.


Analogy with financial interest:

DuplicatedCode represents TechnicalDebt. Cost of repayment of that debt equals to cost of human labor that is required to deduplicate the code. That debt has interest in form of decreased productivity of developers.

That interest is many times higher than financial interest for business loans, which is why it’s financially prudent for a company to borrow money or significantly increase R&D budget to repay technical debt. Since technical debt has similar properties as financial debt it is also possible to build TechnicalDebtBubble?. That bubble will appear by itself when company never allows developers to repay debts after deadline is met.


In SmalltalkLanguage, I think duplicated code is more apparent than in other languages (and certainly easier to refactor away). In Java, though, I find that duplicated code isn't always obvious and sometimes impossible to refactor out.

For example, suppose you have the following code in one method:

 if (this.o.isSomeCondition()) {
and in another method you have:
 if (this.o.isSomeCondition()) {

Does the if test itself constitute duplicated code? If so, it could be refactored away through some form of DoubleDispatch. If I were doing BigDesignUpFront, I might decide that a DoubleDispatch mechanism might be useful some time in the future, but since YouArentGonnaNeedIt, I don't do that now. Does RefactorMercilessly dictate that it is, in fact, duplicated code and should be immediately replaced?

Balance the smell against the cost of reducing it in your environment. Each environment (language, linkers, etc) has its own threshold for how much duplication you can remove without it being too painful or too obfuscating. These language features lend themselves to removing duplication without excessive pain (and you get to choose what's too painful):

For example, here's some duplicated (pseudo-) code:

 def findAManager:
  for each employee in employees:
   if employee.isManager:
    return employee
  return nil

def findASupervisor: for each employee in employees: if employee.isSupervisor: return employee return nil

def findAWorkerBee: for each employee in employees: if employee.isWorkerBee: return employee return nil

Here the loop is duplicated, but removing the loop will obfuscate the code without making it any clearer what you're doing. If, however, you have something like closures or lambda expressions, you can abstract the loop out:

 def findFirstThat (condition):
  for each employee in employees:
   if condition (employee):
    return employee
  return nil

def findAManager: return findFirstThat (lambda worker: worker.isManager)

def findASupervisor: return findFirstThat (lambda worker: worker.isSupervisor)

def findAWorkerBee: return findFirstThat (lambda worker: worker.isWorkerBee)

There are other ways to abstract out the loop, some easier, and some harder. You just have to balance how hard the abstraction is against how painful the duplicated code is. Without lambda expressions/closures/continuations/whatever, the little bit of duplication in this little example isn't painful enough to do something about.

So: Balance smell against pain.

Is it sheer coincidence that the pseudocode above is almost executable PythonLanguage code? :) BurkhardKloss

Ah. That's what all the lambdas are doing. When I write pseudocode, I assume the language lets me say what I want. e.g.:

 def findAManager:
  return findFirstThat (.isManager)

As a (kind of) counter-example, you can do something similar in C++:

 struct FindSupervisor {
  bool operator() (const Employee &e) { return e.isSupervisor; }
 } f;

Employees::iterator i = std::find_first_of(employees.begin(), employees.end(), f); if (i != employees.end()) { Employee e = *i; }

Repeat as necessary for the other types of worker.

My point being that, because C++ doesn't have real closures, you'll have to find some other way to refactor the expression (or leave it alone).

Just reinforcing the point: Balance smell against pain. You might consider this to be more painful than necessary.

Check out OnceAndOnlyOnce. Also check PatternOverdose and PatternAbuse. Your first example on this page has this drift. --BerndGoetz

Failure to factor out duplicated code is one way to get the dreaded MonsterSubroutine.

I absolutely agree. But, when is it time to factor out duplicated code? Particularly because YouArentGonnaNeedIt. Clearly these forces need to be balanced, but how? ExtremeProgrammingMasters can do it through intuition and experience, but I am but a grasshopper.

[My (outsider's) understanding is that YouArentGonnaNeedIt doesn't apply to refactorings -- If a particular refactoring brings you closer to TheSimplestCode, then you need it. -- AnAspirant]

I think that the modification I describe below should set off the alarm - the code is absolutely identical. If there is a chance that the code will at some time differ for the different invocations of the function being replaced, then duplicating the code may make sense.

Besides the scenario described in MonsterSubroutine, another arises from code like:

 ... more code with modify_it() calls

Then a change requires that modify_it()'s functionality be replaced by:

 prep_modify_it(&X, some_other_variable);
 for ( I = 23; I < 999; ++I) {
  some_complicated_stuff(&X, I, third_variable + I);
and then instead of factoring this into a function, it is edited in place for every modify_it() call, leading to code bloat and difficult maintenance. -- RobertField

Consider instead...

 modify_it(&X, some_other_variable, third_variable);
 modify_it(&Y, some_other_variable, third_variable);
 ... more code with modify_it() calls[And change these too, in the obvious way.]

void modify_it(typeX *pX, type2 some_other_variable, type3 third_variable) { prep_modify_it(pX, some_other_variable); new_modify_it(pX); for ( I = 23; I < 999; ++I) { some_complicated_stuff(pX, I, third_variable + I); } }
-- JeffGrigg

Precisely. It is the failure to do something like this refactoring that gives rise to the monsters. -- RobertField

If we could replace the original with:

 if (isSomeOtherCondition())

so that the expression is at least factored out:

 bool isSomeOtherCondition() {
  return this.o.isSomeCondition();

then the repetition is less painful. (I think it helps to omit the unnecessary braces, too.) See also the LawOfDemeter for why having hardwiring long dotted-paths like this.o.isSomeCondition() is intrinsically dubious.

Beg to differ slightly... this.o.isSomeCondition() simply calls a method on an instance variable, although I would likely use an accessor instead of going directly to the var. Perhaps the answer is related to "why can't 'o' decide to doSomething1() or doSomething2() itself?"

Referring back to:

 def findAManager:
  for each employee in employees:
   if employee.isManager:
    return employee
  return nil

def findASupervisor: for each employee in employees: if employee.isSupervisor: return employee return nil

def findAWorkerBee: for each employee in employees: if employee.isWorkerBee: return employee return nil

This could also be written as

 def findType(type)
  for each employee in employees:
   if employee.isType(type)
    return employee
  return nil

It requires a different way of managing types. I personally prefer this way because you don't have a method per way, plus it allows for iteration over a series of types in a list. For those who want the isType() methods, have them call the generic one.

A RelationalWeenie version:

  getEmployees(criteria) {
    return(query(stdConn, "select * from employees where %1", criteria));

Note that this example is not limited to just EmployeeTypes. However, the downside is that some may consider it a security risk because it is more open-ended (SqlStringsAndSecurity). Then again, so is the original because one can simply get a list of all employees by calling it 3 times.

An interesting paper on redundant code that shows some statistically significant positive correlation between seemingly harmless duplicated code and real errors is at http://www.stanford.edu/~engler/p401-xie.pdf. "We expect that redundancies, even when harmless, strongly correlate with hard errors. Our relatively uncontroversial hypothesis is that confused or incompetent programmers tend to make mistakes." --StevenNewton

For automated support finding duplicated code, see SimScan, DupTective and SameTool (or CategoryDuplicationFindingTool)

The EclipseIde will have in its release 3 a simple(?) refactoring tool for creating methods out of DuplicatedCode. -- AndreasSchweikardt

See also CopyAndPasteProgramming, OnceAndOnlyOnce


EditText of this page (last edited November 15, 2014) or FindPage with title or text search