Extract Algorithm Refactor

Tip: If it's not a Pattern (3 citations rule, etc.) it's [most likely] a Refactor...

Problem: You have old hairy code containing an awesome algorithm.

How do you get it into new code?

Tip2: Only do this if it decreases risk. It's not a "principle". And nobody's talking about start-of-project here; just start-of-module.

7% Solution: You apply the old code as the AcceptanceTest that has to match the new one for the same output.

Only after it runs flat-out with every old feature do you ever add any new features. (Assuming all the old math was correct ;-)

A canonical example of this Refactor's input conditions and very nice output appears here: http://wiki.rubygarden.org/Ruby/page/show/FractalLifeEngine

After looking at the above for a few minutes, I see no trace of ExtractAlgorithmRefactor there. It looks more like "look at someone else's code and then rewrite your own version of it, and by the way, UnitTest it". If that's not accurate then someone should explain the ExtractAlgorithmRefactor part of it. Otherwise, there's no design pattern and no ExtremeProgramming here at all; it's just writing software.

Why not a function-by-function conversion? Most code in a high-level, general-purpose language can be mapped isomorphically to a different high-level, general-purpose language.... Am I missing something?

A function-by-function conversion is appropriate if you're moving from a high-level, general-purpose language to an equivalent high-level, general-purpose language. For example, moving from Java to C# -- languages close enough in syntax to frequently be the same -- a function-by-function conversion will certainly work, but even there you might lose the potential benefit of C#-specific features like properties or delegates. If your target language is a very different high-level, general-purpose language -- e.g., moving from Java to Haskell -- the differences are probably significant enough to warrant re-implementing the algorithm to take advantage of the target language and/or paradigm.

Some problem spaces obviously explicitly require a lot of domain analysis. For example, if you wrote the first program that abstracted the science of shipbuilding, you must first check into a library, read every book available on the subject, and write down all the algorithms and coefficients involved. Requirements gathering is an example of a good kind of BigDesignUpFront.

EAR represents the opposite heuristic: A project which for very good domain-specific reasons can only indicate one must not analyze the domain to apply the result before any coding. One must write a very specific, long, top-to-bottom test first. Again, if this specific code were designed first, and that design applied to its functions before the first task got done, the new code would skew the results and risk subtle errors.

I posit that at the tails of the bell curve there are a few kinds of projects that, on the left, must always get a big test project to analyze the requirements, and on the right must always start by reading a bunch of books.

The example of algorithm migration vs shipbuilding software reveals both migrate algorithms, but the first from source code to new source code, and the other from books to source code.

-- PhlIp

Here some invisible Wikizen pops in and somehow convinces Ron PhlIp intends to do this to every legacy system on the planet or something...

I've never seen this process work. Issues are:

No interesting old program embodies just a single algorithm. It'll be a product, and what they want is to replace the whole thing.

The program isn't bug-free. If it were, they wouldn't want to replace it. Therefore using it as an acceptance test requires you to write an incorrect program. (And worse yet, sometimes that's what they think they want.)

The program is under maintenance. So you are in a tail chase as they add features.

The process, as described, defers delivery of business value for a very long time. They get bored, and kill your project.

-- RonJeffries

Those are situations that might indicate one should not do it.

One "awesome algorithm" an old system could couch is some big hard math formula. Consider a FORTRAN project that draws IsoplethDiagrams, converted to C by f2c, and then upgraded. This code fork is a dead end. But that mass of stuff generated by f2c contains correct MathTheory.

Your first spike - before studying the math, reading books about math, or using the project to try out your latest theories - places the last version of the old code into an AcceptanceTest of the new one. You must get the old behavior matched to the new before proceding forward. Otherwise, you are doing two things at the same time - reimplementing old functionality, and inventing new. Don't do two experiments at the same time.

-- PhlIp

I've never seen this process work.

I have seen this process work. Mostly. With some variations.

I worked on a system in 1983 that was written in RATFOR. The code passed gracefully through the RATFOR to FORTRAN (III ???) preprocessor unscathed; it didn't use loops or complicated conditionals and was entirely structured by gotos.' And it had a lot of bugs. But it represented a partial solution specification. It worked for many of the normal cases, but failed badly on many less frequent cases.

The code was a part of a large system. It was a part responsible for a fairly focused algorithm. The system analyzed patient ElectroCardioGrams? for arrhythmia conditions and saved trend data. A newer part of the system allowed a clinician to edit the initial analysis. This program had the job of updating the trend data based on the edits.

I spent three months reworking it following roughly this idea. I would use the output of the original as a partial specification. I collected lots of tests by feeding input into the program and saving the output. Then I updated the ones that were wrong and used that as my test data. The main difference is that I rewrote the program in place rather than creating a new separate program. I reused chunks of the original that mostly worked and replace large chunks that had problems with complete rewrites from scratch based on the tests. By the end of the summer, I had replaced essentially 100% of the code and fixed all of the known defects.

-- RobSartin

If you're using ExtractAlgorithmRefactor on a single class or class hierarchy among many, I guess that you'll have to preserve all interfaces that clients use. Do you have to set up SmokeTests (or AcceptanceTests) that cover the clients of the class(es) being changed? The point would be to reveal subtle bugs in the classes being changed, of the kind that are "masked" by corresponding bugs in clients.

I think it is a very interesting technique and I'd like to see a more detailed example of this.

-- PeterLindberg

CategoryPattern | CategoryRefactoring