Why Did You Refactor That

It is very easy to refactor things when you have tests... easier still if you have a RefactoringBrowser. But, refactorings have no sense of good and evil. They rely on you for that. Many of them are inverses of each other. Applying them can either make your code better or poorer depending upon the context and your goal.

There only seem to a couple of good answers to the question "WhyDidYouRefactorThat?"

There are a couple of others: But those aren't primary. If you justify by the primary reasons, you have more opportunity to use the secondary ones.


"refactorings have no sense of good and evil."

Perhaps they have. How about this: "Either a refactoring reduces entropy, or it does not. Refactorings which increase entropy are evil. Refactorings which leave entropy constant are neutral (and therefore not cost effective)."

[This raises some warning flags in my mind. The NoFreeLunchTheorem states something to the effect that no one optimization technique is better than any other when compared over all possible cases. Since 'reducing entropy' sounds a lot like optimization, I suspect that to achieve some BigRefactorings it might be necessary to perform small entropy-increasing refactorings in order to get out of a local minimum. Don't miss the forest for the trees. -- RobHarwood]


Yes. MichaelFeathers mentioned something like this on ReFactor and RonJeffries countered with RefactoringAddingComplexity.

How about AutomatedRefactoring?


It's all well and good to say, "Refactorings that increase entropy are evil," but what does "entropy" mean? It sounds like there's some function that can be integrated over the whole program, producing a measure that can be computed before and after a refactoring; the refactoring should be preserved if and only if it decreases the measure.

If there is no such measure, then maybe stealing a term from thermodynamics is a bad idea in the ScienceItOrZenIt vein. -- TomKreitzberg


How about the ratio of pieces (classes/methods) to the size of pieces decreases? Not true of all good uses of refactoring, but it has a direction.


Code volume?


Hmm.. we may be onto something here. High code volume requires more energy to carry, so entropy.


I'd be pretty comfortable measuring the entropy of a system of code like this:

The ratio of the size (in bytes) of the all the source code to the size (in bytes) of a compressed archive of all the source code.

(Lower is better.)

This is a handy way of measuring how well you do OnceAndOnlyOnce. It does penalize you for having longer (and presumably easier-to-understand) variable and method names, but not much.

Of course, it's still only a measure, and only looks at the syntax of the system; see below for many people's comments on how useful measures are/aren't. --GeorgePaci


The use of the phrase entropy is imprecise. Besides the misappropriation of a scientific term, it's not even a close fit. The program should operate the same before and after the refactoring (after all, why do you have all those tests to ensure nothing breaks?). Therefore, entropy, or disorder, doesn't change. TheCompilerDoesntCare?.

The main reason you refactor things aren't the reasons above. The reason you refactor is because you want to add functionality and the design makes it difficult. Anyone who refactors just for refactoring's sake is wasting money, and in a perverse way, increasing real entropy.


RalfReissing? is spending his PhD thesis on design quality metrics and the effects of refactoring in Java.


Somewhere on wiki are several discussions of QualityOfaDesign?. The shortest answer is, there is no agreed upon measure of quality ofa design. Therefore there is no single-valued dimension to "improve" along. The quality ofa design is subject (not only to the viewer's preconceptions) to the circumstances in which it is located - its purpose and future life. And yet... there are some designs that experts consistently groan at. --AlistairCockburn


If we're talking about improving the quality of the design, then I think we should say that we want "to improve the quality of the design," not "to reduce the entropy of the design."

I can see the appeal of using the term "entropy" in a very informal sense, but when you put it into a principle like, "Refactoring is good if and only if the entropy of the design is reduced," then you are saying something that has no meaning. (For that matter, I have a hard time believing, "High code volume requires more energy," means something; can anyone explain it to me?)

But metrics themselves -- and I like metrics -- can only be suggestive. I can reduce average method size and complexity, improve or leave unchanged everything that goes into the metric suite, and still wind up with an "evil" refactoring, because the purpose of refactoring is to make the thing better -- where "better" depends, as has been said above, on the context and your goal. And the context and your goal cannot be captured in design or code metrics.

I'd like to challenge this notion. Could someone please display an example of a refactoring that improves or leaves unchanged metrics like the above, while actually making the system worse? Thanks! --RonJeffries

How about "extract superclass" before you decide that it is superfluous and "collapse hierarchy?" - mf

Yabbut, why's the system worse after you do that? Extra credit: answer without mentioning anything that we could metricize.

Lack of clarity. I can think up numerous goofy superclasses that don't add value. But that is just because I am a recovering PrematureGeneralization addict. -mf

Suppose I find a switch statement in the code. Do I ReplaceSwitchWithPolymorphism?? The metrics don't reveal whether I've cleverly made a class closed to modification or boneheadedly complicated something like "sex of patient." My idea is that you can apply the mechanics of a refactoring without considering the motivation, and the metrics -- which measure the mechanics, not the motivation -- can't necessarily tell you whether you did good. (They're probably better, though, at telling you when you did bad.) -- TomKreitzberg

I tend to agree. - MichaelFeathers

If you want to see how improving a single measure can lead to awful code, just optimize Average Method Size: apply ExtractMethod to every single line of code in a class. Your average method size is now 1 line (or close; you may need some supporting code) -- but it'd be hard to find anyone to agree it was better. (It reminds me of the StoryOfMel: "In modern parlance, every single instruction was followed by a GO TO!")

I suspect if you combine a bunch of measures, you can define a zone which includes most code considered "good" and excludes most which is considered "bad". On the other hand, what's the computational power of even a bunch of measures? And how hard is (whatever we can make objective about) figuring out whether code is good or bad? My guess is measures are useful rules of thumb, but fail for unusual cases. --GeorgePaci


The standard cycle is:

The next time you go through the cycle, the first step is: Since nothing changes about the code itself between cycles, some of what the end-of-cycle refactoring does must be opposed to what the following beginning-of-cycle refactoring does -- otherwise why have the latter?

bug fixes?

So you can't just measure how good a refactoring is -- you have to measure it against its goal. ("Fill in the implicit parameter" is apparently becoming my motto on Wiki; cf. UnitTestsTellYouWhenYoureDone, MeaningDependsOnContext.) --GeorgePaci


EditText of this page (last edited July 1, 2003) or FindPage with title or text search