Adding Features To Legacy Code

From xp mailing list:

Message: 5

Date: Thu, 9 Nov 2000 11:29:05 -0500 From: "Arrizza, John" <john.arrizza@marconi.com> Subject: RE: Adding Features To Legacy Code

long reply, bail now.

 > -----Original Message-----
 > From: ajalis@twu.net [mailto:ajalis@twu.net]
 > Question: What is a good strategy for adding new features to old code
 > written by someone else that I don't understand very well.

I've done this for two small apps (2-6K lines) and one bigger one (initially 65K) lines), so I speak from few experiences. My experiences may not hold for your app.

The overall rule I've come to is that small, gradual changes are best (surprise, surprise). However, you can sometimes get away with a global sweep of changes on small apps. I just haven't had the courage to try it on the big app.

When I've gone the incremental route here are some things I've observed:

fixing one small part of the code makes the overall functioning of the app "better". Not sure why except to conjecture that in legacy apps the code is so tightly interwoven that any local change has global effect.
extracting duplicate code seems to have a good effect out of proportion with the changes made. For example, I merged a couple of similar pieces of code into one method, rearranged a couple of lines and retested. A whole bunch of memory leaks disappeared and the code ran significantly faster. I looked at the code changes I made and they seemed small but put braces around a couple of objects that didn't have them before. That small effect shortened the lifetime of a couple of objects, (i.e. from infinity to a few milliseconds).
deleting code is a good thing. As much as possible I delete unnecessary or unused code. Eventually some classes become trivial to the point where their intent is apparent and I can then replace them, delete them entirely or fill them up with code that is actually useful. I was able to shrink the 65K down to 33Kloc by deleting unused parts of the code. I replaced other parts with standard STL and other common library pieces, removing entire DLL's and libs. The code immediately got better.
I don't delete comments until I've refactored a piece of code. The comments can give clues to what the programmer had in mind when he/she was originally coding. The code will often not match the comments but I've been able to sometimes use that to advantage.
I apply a formatter every so often. It's surprising how readability is affected by indentation, spacing, etc.

 > The code is unit-test unfriendly.

I generally don't do them. I use functional tests. However, if a there is a utility class that is nicely encapsulated and it's intent is clear to me, I spend the time to write the UT.

Even then, I still hesitate to write the UT. A couple of times the UT showed some "bugs" in the class. I dutifully fixed them. Bad idea. Running the FTs showed that there was code dependent on how those classes worked albeit badly. It seems better to drive the refactoring from the other direction, from the top down. Sometimes refactoring high level code causes the methods of lower classes to be unnecessary. Sometimes the high level refactorings cause the code to call low level classes differently, removing the dependency on a low level "bug".

On the other hand, fixing low level bugs has high impact on the app as a whole. I find when the app runs "better" it somehow seems easier to guess how it should be running. So I am tempted to fix low level bugs...

BTW it pays to make sure the FTs run in 10-20 seconds. You might want to spend some time stubbing the DB calls your app makes for example.

 > In the past, this is what I have done. <snip>

Your method seems riskier than doing things incrementally.

 > (so I have an easy way to back out and restore the system to its initial state
 > -- an exit strategy).

Excellent point. I use my source control tool to help me out. I make a few fixes, run some tests. If the code is stable, I check everything in and apply a label so I can regress (i.e. small releases). It doesn't matter that the difference between labelled versions is small, what counts is that each version is stable. When I feel there has been significant improvement I make a "major" release. I then do a full(er) range of tests on that release. If possible I get QA or other people to run the release. If the "major" release holds, I blow away the "minor" labels and start again on small fixes.

> Question: Is there a better way to refactor? > If I could do that, I could do some simple refactoring on them - like > split them up into smaller subroutines. But wrapping the unit tests around > them is not obvious. How do you write unit tests around legacy code which is > not organized into neat objects?

I rearrange the code so much that writing many unit tests doesn't make sense to me. I just make sure I have a good suite of FTs. I also run the FTs with the app in the debugger. I put a breakpoint on the code I just refactored to make sure that the code is being executed!

One skill you must have if you're going to do this is the ability to look at two pieces of code and *accurately* determine if they are the same or not. You're doomed if you can't do this for short snippets of code. If you can't, you must have UTs to back you up. This also means that you can't refactor in languages that you're not well versed in. In short, you've got to be honest with yourself about your skill level.

I can do this for 4-10 lines of C++ max with a small (and I hope well-known) assumption of risk. After that it becomes too complicated to verify mentally the two snippets are identical. The smaller the snippets I refactor, the easier it is verify they're ok (without a UT that is).

Extracting a method however, is *nearly* risk free. If I'm worried about extracting a method in C++, I first put braces { } around the code I want to extract. I verify that all the variables and temps it uses are within the braces {}. If not, I move them in. I test and if it still works ok, I extract the method.

I still get bitten every so often of course but I've only changed a few lines so it's easy to track down where the problem is. If I can't find the problem, I don't try to fix it for very long. I back out to the last minor release I did and reapply my changes. The reasoning is: I've screwed up, I'm anxious, the worst thing for me to do is take on an even more complicated task. I swallow the lost time and go back to a known good version.

I found the *best* refactorings to do are:

extract method (easy to do, low risk, low - medium impact)
extract class (harder, med risk, high impact)

Extracting a class is the best of the best because it means I've captured a part of the design embedded in the legacy code. It also means that I now have a repository for grouping snippets of code together. Unfortunately, this happens infrequently until I've refactored the rest of the code to the point where it's intent becomes apparent to me.

I view the other refactorings as a way to get the code to a spot where I can extract a method or a class. I find renaming variables, methods and temps are great ways to help ease the burden of reading the code. The new names also serve to remind me of my guesses as to the code's intent as I continue refactoring. If I'm lucky, I "gather" up enough of the code's intent such that I can identify and extract common code into a new class.

Good luck,

-- John

I'm often working on legacy code; around 300k to 500k lines for single person projects, to millions of lines for multi-person projects. The most important things to do when you walk in the door are...

First, make several small changes of some value to the customer. This will demonstrate to the customer that you know what you're doing, and give you a chance to figure out some basics of how the system works and how to do a successful release in that environment.
- There's a good reason why your first class programming assignment was "hello world!". ;->
Priorities are very important: The customer will always have an excessive number of things they want you to change. Have them prioritize the requests by value (or do it yourself, if they refuse). Start with #1 and work down. Don't do #2 until #1 is finished.
Always improve the code with every change you make. Write comments, change to more rational variable or function names, refactor methods, introduce classes where needed, etc.

You can usually get away with spending up to 50% of development time improving the structure of the application, while you're adding customer-requested functionality. This investment will quickly pay off, by improving your productivity, so that within a few weeks, you'll have more new function points implemented than if you just did the "minimal hack I can get away with" approach that less enlightened maintenance programmers use. And from then on, things only get better.

When you break the app while fixing and improving structure, and it's going to happen from time to time, you can, with a straight face, say that the app broke because of the new functionality you were implementing at their request -- you don't have to mention the refactoring. Yes it's underhanded and political, but it's a survival strategy.

-- AnonymousCoward (Nov 16, 2000)

(ExtremeProgrammingInEnemyTerritory)