Code Harvesting

Over time, I've learned that trying to get an abstraction right the first time is like premature optimization--until you can make decisions based on real usage patterns, your early guesses are liable to be off. Allowing a bit of CopyAndPasteProgramming can add interesting diversity to the gene pool, resulting in a better basis for selecting out winning abstractions.

We used CopyAndPasteProgramming in the Database Group at ParcPlaceDigitalk while building database connects for VisualWorks. A new DB Connect often started life as an existing connect (large scale CopyAndPasteProgramming), then got hammered into shape to work with a new database vendor's client API. Common code eventually got harvested and migrated into the abstract framework. Because there was often ongoing parallel work on several DB Connects, and we wanted to keep the framework stable, we didn't do too much on-the-fly harvesting unless someone ran into a dead end.

This pattern of incremental growth and harvesting resulted in some abstractions that we probably couldn't have gotten right up front without experience, weathering and seasoning. -- DaveSmith

Then, in DontRepeatYourself, RalphJohnson wrote:

... it is often better to wait til you have several examples of the duplication so you can better see how to eliminate it.

The two appear to me to be the same argument, though at different levels of abstraction. So let's reify this with a name, CodeHarvesting, and give it a preliminary definition: Letting a duplication of logic live for now, in order to see how to best universalize it at some later point.

I'd love to hear people discuss more experiences with this kind of habit. Is it genuinely useful? Is it a procedural crutch? How do you know when the duplication has spread for too long?

I HaveThisPattern. Often it seems like there are five different ways to factor BEFORE you do the duplication, but once you've raped and pasted (and maybe let the duplicates ripen a bit) the best way often becomes obvious. You just have to remember to factor though before they start to stink up the place. --JamesWilson

If you've been coding long enough, you have a lot of examples of usage stuffed in your head so you can make meaningful abstractions without waiting for duplication. For instance, we've all written a thousand sequenced collections, we all know that you need append, get, insert, remove, count, etc. methods. This is a dangerously arrogant line to walk though as you might overestimate yourself. However, being conservative isn't optimal, just as being arrogant isn't optimal. So fair warning if you do this. -- SunirShah

It should be possible for a tool to automatically do code harvesting. Then you could just cut&paste and let the tool clean up behind you. -- StephanHouben

I don't believe that "automatically" will ever happen (too many real problems related to Hard AI) but there can certainly be very good tool support -- see RefactoringBrowserForJava. -- KyleBrown

or see SameTool -- JeroenMostert

In my experience, the best way to do CodeHarvesting is to tag any copy-pasted block with a predefined FixmeComment, which I periodically grep for ("HARVEST" to go along with "TODO" and "XXX"); but never to change the code, just leave it in both places exactly as-is. When I start wanting to change the functionality, I always try to change it by changing the work before or after the copy-pasted block. This way I can analyze the sorts of pre- and post-processing I do. Once I've decided where this code should "live" (in an OO system) or what it should be called (in a functional or procedural system), this eliminates nasty surprises when I finally refactor it, but gives me something inline to read every time I run across the block where it's been inserted. --GlyphLefkowitz

from defunct JustTooLateReuse page:

You didn't fall prey to PrematureGeneralization; you ended up with lots of code fragments that do pretty much the same thing. At some point, write reusable code that fits the need of those fragments, and replace them with calls to the reusable code.

(Consider this a placeholder for a real pattern. Please feel free to replace it. Comments, suggestions, additions greatly appreciated. --PaulChisholm)

Inspired by PrematureGeneralization. See also JustInTimeReuse, RefactorSlack.

We call this EntropyReduction, and schedule it to happen between schedules. (This gives the developers something worthwhile to do while the architects and planners are at work.) --DaveSmith

I HaveThisPattern, too, and I think I see the motivation for ThreeStrikesAndYouRefactor. It is important (especially on a fast-evolving project) to differentiate between (what I call) "essential sameness" and "accidential sameness." Do two pieces of code look the same because they have the same underlying essence, or do they simply happen to have structural similarity? In the first case, refactor. In the second case, I believe refactoring is a mistake. If the underlying structures change (often because of refactoring elsewhere), you may not want to have factored out the "common" code, and doing so would have been PrematureGeneralization. To avoid this, don't refactor until you can see the essential sameness. I'm sorry I don't have a good example of this, but does this ring true with anyone else? --PaulTevis

see ConquerAndDivide, LanguageHarvesting

CategoryReuse