Refactoring Mercilessly Hides The Forest

My programming partner and I have a running argument about how effective RefactorMercilessly is. I tend to be far more merciless than she is (probably to the point of going overboard) and she tends to be more conservative.

Recently, she brought up a good point: Breaking the system down into a lot of very simplistic pieces (as RefactorMercilessly will do) adds far too many trees and makes the system harder for someone new to understand what's going on. I can see her point. Refactorings that made perfect sense one day grow stale in my mind after a few weeks of not looking at the code and I'll often have to think a bit to remember exactly why we chose a particular path.

Note, I think this is different from the FearOfAddingClasses AntiPattern because the motivation is different.

My current hypothesis is that WellFactoredProgramsCannotBeUnderstoodStatically and thus, for system of any real size, the AcceptanceTests provide the best way for a new user to figure out how the pieces fit together. This still doesn't address why, but I'm not sure that any amount of code sans comments or design docs will address why a particular refactoring/design decision is made.

-- MarkAddleman

Example, please, of a big method doing two things that is easier to understand than a method doing two sends to two other methods. Or another more apt example.

A good example would be small finite state machines. Sometimes they are easiest to follow when they are coded in one page using gotos. Converting the clauses of such to functions, or StateObjects?, is a step one must feel pressure to take.

To the first approximation the ideal refactoring is one that minimizes number while avoiding duplication. E.g., extracting a method so that it can be used elsewhere.

A common extension is to refactor for increased understanding. E.g., extracting a method so that its carefully chosen name substitutes for a comment. Since this increases number with no reduction in duplication (yet), it runs counter to the first principle.

Although duplication is itself a concept that eludes sharp definition, understandability is totally subjective and depends largely on the reader's context and past experience. That is, effective refactoring requires judgement about the context and experience of future readers. This may sound hopeless but it isn't. The best approach to acquiring the skill is to program together regularly with just those readers, or people like them, and approach every discussion of what is or isn't readable with an open mind. Good luck. You're doing great so far. -- WardCunningham

I've been programming for 10 years and only recently have I begun to understand what constitutes DuplicatedCode. I know that sounds silly, but it really isn't all that obvious particularly in Java. Maybe not surprisingly, certain forms of DuplicatedCode were easier to see in CeePlusPlus. It wasn't until I took a month to dabble lightly in the SmalltalkLanguage that I started to understand. And once I recognized that I did not understand DuplicatedCode, I have begun to understand the fullness of OnceAndOnlyOnce. -- MarkAddleman

You can refactor in both directions, too. If you realize that your decomposition is too granular, you can recompose the code until it makes sense again. -- SunirShah

Yes, and the word "mercilessly" could use some elaboration. It doesn't mean "until each method has but one statement in it", but instead is directed at the resistance that some feel about taking the time now to do the right thing, for fear of breakage. Damn the torpedoes ... that's being merciless. -- WaldenMathews

The analogy of trees and forest is very applicable to the subject of refactoring. Let's say that you see a lot of code that says "tree tree tree tree". Wouldn't it be better to say "tree tree tree tree" OnceAndOnlyOnce in a "forest" class or method, and then only have to say "forest" everywhere else? On the other hand, maybe you run across a method called "forest" with only one or two trees. The MercilessRefactorer? might say "This is stupid, it's just a couple of trees!" and then recompose, as Sunir puts it. Some one-line methods would be more understandable if they were inlined to be closer to their context.

Refactoring can also be a time to discover common concepts. Some system metaphors are developed before the code, and others emerge over time.

Similarly, refactoring is an appropriate time to develop infrastructure. Let's say several classes have implemented their own methods for logging. Then somebody decides to refactor to increase simplicity and reduce duplication. Pretty soon you've got a consistent logging facility for the system that does only what is required. A logging facility that is developed too soon in the product lifecycle will be likely to violate YouArentGonnaNeedIt and DoTheSimplestThingThatCouldPossiblyWork. -- DavidNoble

It's probably helpful to remember that refactoring doesn't have any particular direction in and of itself - witness complementary refactoring pairs such as ReplaceTempWithQuery and ReplaceQueryWithTemp. So distinguishing between refactoring-to-make-it-easy-to-add and refactoring-to-consolidate-after-adding should help you orient yourself, at least for small-scale refactorings.

Sometimes my pair finds itself smashing a given class's methods into itty-bitty bits (usually following the call of OnceAndOnlyOnce). And a couple times, keeping track of all those little methods in one class has become confusing. The key to becoming unconfused in each case was to stop trying think about what was going on at one (low) level of abstraction, and start thinking about things at two levels of abstraction.

The next step for us was to take the lower-level methods (and their fields) and pull them out into their own class. We may never reuse that class, but it's nice to get the low-level stuff out of the way when we work with the original (now much simpler) class.

I know this isn't much in the way of instances, and your mileage will probably vary, but it's been helpful to us. The main lesson, as I see it, is: If you have too many little pieces in one container, one solution is to split the container. The secondary lesson is: breaking big pieces up makes it possible to recombine the little pieces into different, better big pieces.

Finally, at the risk of being obvious, I disagree with the page title, and suggest that truly merciless refactoring includes recombining, not just breaking up (synthesis, not just analysis). Refactor your way to a better forest. -- GeorgePaci

Actually, I like that idea a lot. One of the refactorings that I'd like to use more is MethodObject, but without support from a tool, I often get lazy and think it's just not worth it. I understand that jFactor is going to support it in the near future.

makes the system harder for someone new to understand what's going on

Whether many tiny methods are easier to understand may be a matter of taste.

I believe many tiny methods are hard to understand. But I refactor my code into many tiny methods anyway because I believe it makes my code easier to change. I value easy to change code more than I value easy to understand code.

See RavioliCode -- StanSilver

I'd vote for StanSilver on the point that it's a matter of taste. According to Gregorc's mind style model, there are four different kinds by the combination of four parameters: Concrete, Abstract, Sequential, Random. Hence, there are CS, CR, AS, and AR. Some people like concrete examples first to get the hang of something, whereas some like abstract principles or theories first. -- JuneKim

That's kind of a big hammer to pull out to take care of this little fly. I'm willing to agree with Stan that some things are a matter of taste; more formally, some things may be such that A is not better than B for everyone, nor is B better than A for everyone. But in this case, I think there's still a partial ordering - there's a solution that's clearly better than having dozens of tiny methods in a class and better than having big chunky methods in the class that repeat parts of each other.

And that's to create those tiny little methods, then sort them by level of abstraction, and move some into a new class. This isn't just because I find lots of tiny methods hard to keep track of and trace through; it's because I don't think anyone finds it harder to keep track of just the higher-level methods.

To hijack the milk analogy, I'm perfectly willing to say that everyone prefers drinking milk (new class) to drinking milk laced with sulfuric acid (lots of tiny methods in one class). Note that this statement is not a matter of taste: either I'm right in this, or wrong in this. -- GeorgePaci

I think you can write code that just about everyone would agree has too many methods for good understanding. Likewise, you can write code that just about everyone would agree has too few methods for good understanding. But is it possible to write code that everyone agrees has just the right number of methods???.

See BinarySearchCodeOnly, GoldilocksSolution. -- StanSilver

No. Some people can't understand the code no matter what you do, so there will never be a right number for them. Others can understand the code no matter you do, so they don't care. Most people fall in between this range, so the "right" mix is subjective.

Yes. Some domains are ridiculously easy to map onto objects. For instance, not many people would disagree with a Rectangle.

Maybe. I disagree with Rectangles all the time. I constantly write my own Rectangle class to DoTheRightThing, when the given implementations (Microsoft/GDI/DirectX, Sun/Java) don't. But once I write the rectangle to fit our requirements, others don't disagree with how it's implemented because Rectangles are simple.

-- the ever decisive, SunirShah

Example, please, of a big method doing two things that is easier to understand than a method doing two sends to two other methods. Or another more apt example.

I have to agree with this one. FSM in objects CAN BE hard to grok in fullness, because the states and transitions can be spread over many classes, if you go the whole way to a bunch of state subclasses each implementing a "do something" and "transition" method. When the case statement gets too big (no language I use has goto any more), then I tend to go to a transition table, represented in data, something like:

State Code New State ----- ----- ----- Idle Cash takeCash Idle Credit takeCredit takeCash Exact thankYou takeCash Over giveChange thankYou Timeout Idle

This, for me, increases clarity and reduces code. The full-boat FSM thing rarely comes up for me when I move from the case to this. -- RonJeffries

Does the following apply?

If OaooBalancesYagni (Using the once and only once rule makes things easier to change in the future, so you do not have to plan for the changes today)

Then "make little tiny methods more understandable" patterns balance oaoo. (Using "clean up" patterns makes the zillions of little tiny methods that you get from using oaoo more understandable, so you can use oaoo.) -- StanSilver

I've been leading an XP team in CommonLisp for about 9 months. RefactorMercilessly is one of the aspects of ExtremeProgramming which was haziest in my mind when we started. Now I think I'm getting the hang of it. In practice, for us, it's meant noticing certain usage patterns, and encapsulating those in macros or around methods on some base classes. CommonLisp gives you a lot of introspective power to determine what class the object is, who its ancestors and dependants are, so it's pretty amazing the sorts of things we've been able to abstract away. But the key is to DoTheSimplestThingThatCouldPossiblyWork first, and wait to see if some solution starts to repeat itself.

Example: When we started, we had a lot of code like:

(let ((customer (find-persistent-object oid 'customer)))
   (unless customer
 (error 'UnknownObject?))
        ...)

After a while, with many of these, our code started to look like

(with-known-customer (customer oid)
...)

But then we had

(with-known-content-space (space oid)
...)
...

So we now have

(with-known-object (customer customer oid)
...)

-- AlainPicard

A non-LISP way to do something similar:

  cus = getCustomer(custID)  
  [process customer....]

Same as:

  cus = getCustomer(custID,stopOnError=true)
  [process customer....]

Manual error handling option:

  cus = getCustomer(custID,stopOnError=false)  // note 'false'
  if hasError(cus) {   // or cus['~hasError'] or cus.hasError()
     complain(....)
  } else {
     [process customer....]
  }

Of course, often the joins will vary, so perhaps have a more generic version that takes any sql clause:

  cus = getRow("select * from customers where...")

Same as:

  cus = getRow("select * from customers where...",stopOnError=true)

Sometimes you want the routine/method to handle the error on its own (for example, it "should" be there based on a prior operation), and sometimes you want custom error handling.

getRow returns one result set row. Perhaps have getResultSet for zero to many rows.

Discussion of too many versus too few methods moved to WhenToStopRefactoring. -- MarkAddleman

See GoldilocksSolution

CategoryRefactoring