I've run across this word a bunch of times and I sort of have a feel for what it means (vaguely) but ...
Anybody care to define WhatIsRefactoring?
Refactor means to alter factors which produce desired outcomes. A source code is composed of factors. One can change the factors while maintaining integrity. Factors also come across in math where, for example, you can factor 6432 into 32 * 3 * 67. You can refactor into 8 * 12 * 67 and you will have the same product but different factors. The rationale for wanting different factors varies.
In general, refactoring is modifying the code base to ensure that each thing is done OnceAndOnlyOnce. If something is done multiple times, each of those times should be replaced by a call to the one routine that does the thing well and doesn't do anything else. If closely related things are done at places scattered throughout the code base, they should be pulled together into a single routine.
I'll take a stab at it, but RalphJohnson could probably provide a canonical definition. Refactoring is the process of taking an object design and rearranging it in various ways to make the design more flexible and/or reusable. There are several reasons you might want to do this, efficiency and maintainability being probably the most important.
A common method of refactoring is to refactor along inheritance lines. For instance, let's suppose that in a design review you find out that two classes in your system that do not share a common superclass both implement very similar or identical behavior. It would be advantageous to then ReFactor these two classes by moving the common behavior into a shared superclass, and then changing the classes so that they descend from that class. You can also ReFactor by simply moving methods from concrete subclasses up the hierarchy to more abstract superclasses as you see the need.
You can also ReFactor along composition lines. If you find that a class is implementing two different sets of responsibilities that do not interact with each other much, or that use two subsets of the attributes of the original class, you may want to refactor that class into two different classes, one of which perhaps contains the other.
DonRoberts and JohnBrant have been working on a tool for Smalltalk called the Refactory [http://st-www.cs.uiuc.edu/users/brant/Refactory/RefactoringBrowser.html] that makes this kind of thing easy. (Although it's already much easier to do in Smalltalk than in C++).
Suppose you want to convert a class that uses case statements into one that uses the StatePattern. Or suppose you want to convert a class that hard codes the classes of its products into one that uses factory methods, and then you decide to pull those factory methods out into an abstract factory. These are all examples of refactoring a program.
A refactoring is a change to a program that only affects the way it is organized, not its behavior. Some people call them restructurings or "behavior preserving transformations". They are especially common when you are creating a framework, because a lot of reusability bugs can be fixed by refactoring your framework. -- RalphJohnson
See Also: RefactoringAndRewriting (which contrasts the two).
I used to find "refactoring" a weird and awkward phrase, because to me it was a word borrowed and changed from algebra, and I never felt I was doing algebra when I was programming. In algebra, you "factor" (x^2 - y^2) into (x+y)*(x-y). You could think of "refactoring" as any permitted rearrangement in any direction. So you could refactor (x+y)*(x-y) back into the longer version. You could turn 5*a + b*a into (5+b) * a, or vice versa (where "*" indicates multiplication).
Sometime this year, it stopped being weird to me, and started to be natural. I started seeing 5*a + b*a as 3 operations on 6 things. (5+b)*a is 2 operations on 4 things. The 5+b thing is different from the 5*a things, etc. You can see the jump to OO programming.
Let's take the case of
A.method1() = ... b.doThis(); b.doThat(); ...
I change the code to
B.doThisThat() = doThis(); doThat().
A.method1() = ... b.doThisThat(); ...
That change corresponds (in my mind, anyway) exactly to the (5+b)*a refactoring. Nowadays, I see a method and a class as a set of parentheses, and when I move code out of one method or class to another, I visualize it just as moving symbols from one set of parentheses to another. Of course, the net effect of the computation has to be the same, or as Ralph said, it has to be a behavior preserving transformation.
The definition above does an excellent job of explaining WHAT it means to refactor, but it doesn't explain WHY. If it ain't broke, don't fix it. And if a refactoring truly preserves the semantics, it ain't gonna fix nothin no how anyway.
You want to refactor when the code is broken, but it isn't broken on the particular axis called "what results the program computes". Here are some of the defects I fix by refactoring:
Respectfully beg to differ. Alistair gives a why. Remember those pictures you draw with the multiple lines from one class to another, being replaced by fewer? That's the same as Alistair's 5*a + b*a reducing to (5+b)*a. In algebra, you get points for reducing the expression to what the professor thinks is most elegant/simple/whatever. In refactoring, you have to match WardAndKent. -- RonJeffries
Beg to differ on your differing. Respectfully, of course. Alistair doesn't give a "why" that a programmer under fire can remember. Refactoring is not just aimlessly moving code around between the infinite permutations of a program that behave identically. It is there to add qualities to the code that weren't there before. Reduced duplication, greater simplicity, easier reading, better performance - these are some of the qualities you can add through refactoring.
Of course, sometimes you take out one of these qualities to set up your next semantics changing transformation, and that's refactoring, too. -- KentBeck
As long as we are all begging and being respectful and differing, I'll begfully respect to differ. I think, with Kent, I didn't give a 'why' - I only tried to give a 'what'. Unlike Kent, I believe the description "aimlessly moving code around between the infinite permutations of a program that behave identically" would have to be considered valid refactoring. Now we can get on to, Why Are We Doing This? I would ask back, why do I care to unify duplicate code? That is a What begging for a Why if ever I saw one. What a waste of time. There had better be something nagging me to cause me to go in and touch already working code. Find the what is nagging, and you'll find some why's. "Things that are different in the code after you refactor a particular thing in a particular way" will not count as Why. You need to have punishment, reward, emotion in the answer. Ron's BrowniePointMetric works as a sample - "the professor gives a better grade if I aimlessly move this code around but end up with what s/he considers more glamorous." A more likely industrial version is, "I'll do less work total when adding this function if I first refactor this code to remove duplicate code." or "it works, but isn't fast enough, so I'll refactor this algorithm to get it faster." But now we are out of the opening question, What does this thing ReFactor mean?" and into "Why would I care to RefactorMercilessly?" -- AlistairCockburn
You've definitely given me a different respect for begging.
We do all of these things... refactoring, do-the-simplest-thing, etc... so that the code can fit in our heads. Everything we do to improve the code ultimately boils down to that. -- WayneConrad
I think of refactoring as an effort to get the code towards ExtremeNormalForm. Of course, ExtremeNormalForm can be a little vague to get your head around, so we have intermediate principles of good code, such as: OnceAndOnlyOnce, LawOfDemeter, SelfDocumentingCode, and maybe ShieldPattern, too.
Of the "Fixed defects" list, the only one which surprised me was "Replace one algorithm with another". I see refactoring as moving code around rather than writing new code, even if the new code has the same effect as the old code. -- DaveHarris
MartinFowler has a very good intro to refactoring paper on his website [http://www2.awl.com/cseng/titles/0-201-89542-0/refactor/index.html]. -- MichaelFeathers
(Not any more, the book's published and he removed the content)
I've recently updated it to include my some work in progress from my book. I've listed a good number of refactorings with motivation and step by step mechanics. I'd appreciate comments. You can reach it from my homepage --MartinFowler
The place to go now for Martin's info on refactoring is (surprise) http://www.refactoring.com/
One thing that I've been wondering. It seems that many refactorings I've heard of involve taking one big thing and splitting it into several smaller things, or taking methods and moving them from one class to another. It seems that the tendency is to make things simpler where simple means that the number of classes may increase, but the number of methods on each class decreases. I know that there are refactorings where the opposite is true, but the seem less common. Doesn't it seem that there is a "direction" in many refactorings? Like entropy?
Here's my way of thinking about it. Without refactoring, software gains entropy over time. Ongoing refactoring reduces the added entropy, so the code remains clear, elegant, and maintainable as it evolves.
I have this weird vision of OO systems having energy states. Imagine a grid where each (x,y) point is a class and the z-value is the number of methods on the class, or the number of statements per method. Would we expect well-factored systems to have less average height and a smoother terrain? I have this feeling that under some conception there are refactorings that make a graph "hug the ground." --MichaelFeathers
I like this analogy, though like you I'm not sure I believe it. At least not always.
Very interesting energy theory, think of it as compression - You have this code A here and this almost the same code B here stored at separate places in a tree structure. Since they share a lot of common things they can simply be moved around in the tree structure (your main program is just links to code A and B, these links change accordingly) and be replaced with branch C which have small leafs called a and b (your main program would have links to C, leaf a and leaf b now). We have now lowered the energy needed to transmit the whole tree over some kind of link since it occupies less space in total (we do not need to transmit as much bytes). Still the real valuable energy in the tree is preserved. Repeat this procedure until nothing more can be compressed. You end up with everything being random noise, which is pure energy and also the smallest (simplest?) possible representation of your program. -- AndersHedberg
To provide the discussion with a 'why', one reason to refactor is time. I was once on a project where we had about twenty components. Each of them had lots of duplicate code. We had a urge to reduce it to save us the time and it took to make a change and duplicate and test the tedious effort across all twenty components.
We refactored it to the best of our ability, but some of it was un-refactorable - in the context of consolidating duplicate code into a single maintainable base - for purely technical reasons. Every developer who came on board seemed to have the urge to refactor it more. At one point we just created a white paper that described why we made the refactoring decisions we made, and handed it to them when they got that 'look'.
In this case, our 'why' had partly to do with elegance, and mostly to do with saving precious time. -- PhilipEskelin
Here's an (amusing) example of why it's important to refactor when changing or extending class hierarchies to ensure that responsibilities are implemented OnceAndOnlyOnce in the appropriate part of the class hierarchy:
I think that one of the better justifications for refactoring comes from Complexity Science. By refactoring we effectively control chaos, which otherwise sooner or later eats up the system. Refactoring, just like Design 'Patterns' introduces symmetries and brevity - humans' main tool for dealing with complexity. -- Alex Iskold
To complement - not contradict - the above discussion, we also refactor to retain the intelligibility of our designs, so that we can continue to develop them as required. A design which is no longer intelligible is no longer viable - the only options are refactoring, redesign or a freeze on all future development, as appropriate. -- SteveMerrick
An explanation that has stuck with me since my first conversation with a good friend about ExtremeProgramming was this: When you find that you have code that you need to change, and it is hard to do it, there is a large 'Vector of Change'. By refactoring (and using UnitTests to make sure that the refactored code is equivalent), you can reduce the vector of change to the point where implementing the change is trivial.
This is similar to the above energy state metaphor, but introduces something that isn't made clear (but could be implied from) the energy metaphor : When you refactor code to facilitate implementing a change, you make it easier to make a similar change in the future.
What follows from this is an assumption that when you are maintaining the code in the future, changes that are requested/needed will be easy to make, because similar changes have been made in the past, and the code was suitably refactored. -- StephenThorne
If a program is the product of its factors (components), and you want to change that product, it's easier to understand and change one of the factors than to change everything at once. When that's not feasible, refactoring may make it feasible. -- Marty White