Extreme Normal Form Defined

See ExtremeNormalFormDefinitions

What is the value of this word "Normal" in the name? I propose that we refactor the name, getting rid of the, IMHO unnecessary, "Normal" word. --JoshuaKerievsky

I put "normal" in there for a couple of reasons. When OnceAndOnlyOnce has been applied in a code base, it can be said to be normalized. The redundancy has been removed. The other reason was to indicate that your code should hover around ExtremeNormalForm when you are doing XP.. developers should consider it to be the natural state of their code; deviations during development should be considered transitional. -- MichaelFeathers

Yeah, "normal" fits well because of the redundancy angle. The analogy with relational databases also brings YAGNI to my mind: I've had to wrestle too many times with databases that were "denormalized" for performance in the original design!

<<See YouArentGonnaNeedIt for more on YAGNI>>

There's also the more general mathematical notion of normalization. That brings to my mind the two steps of DoTheSimplestThingThatCouldPossiblyWork: you make it work, then you refactor to renormalize the code. --KielHodges

I really liked Michael's earlier comment about "code that is in alignment with the customers' needs." The word 'alignment' definitely suggested to the hint of geometry that I would expect from something that is a "form." I also like how it reminds me of OrganizationalAgility?. In this case, the code is what is agile and responsive (almost reactive), as well as being malleable and minimal (a "lean, mean, refactored machine" ;-) --BradAppleton

Post a definition of XNFD that allows a scientific metric to say how close code is to XNF. Now measure some code. - RonJeffries DefineXnfd?

I am interested in this, even thinking that it can never be carried out. But getting even faintly on the road there will prove useful. To proceed, the XP / C3 people should be so kind as to state, when they look at a piece of code, what they notice does not look quite right, and what they replace it with (perhaps you can just grab a couple of phrases from your next refactoring, this afternoon, change the names to protect the innocent, and post them). Such a list of replacement heuristics declare a kind of cost metric to code phrases, and would be wonderful for people in the industry to stare at and argue over. --AlistairCockburn

Examples are:

If you see "objA msg1. objA msg2.", then replace that with "objA msg1-2." and create a method in objA called "msg1-2" which contains the code "self msg1. self msg2."
If you see a conditional of the form ... (I know you told me once, but I can't recall...)

Alistair - MartinFowler's been spending oodles of time documenting just what you describe in a book on Refactoring - due out sometime in '99. He has a catalog of refactorings, naming these small steps folks take to clean up code and showing the before an after code. Here's an example that will be familiar to many (a similar version is in Kent's book, as well as CODE COMPLETE): http://www.industriallogic.com/papers/extractboolean.html --JoshuaKerievsky

Thanks for the refs. I still think anyone who wants to back XNF needs to fill out the list for XNF. Will MartinFowler's book be the definition of XNF? I tend to think not. What I am interested in is, what do the XP people think XNF contains?

Even with XNF, there should still be lots of different factorings that meet XNF. I checked the Extract Boolean ref you put up there, and I would assert that, of Inline and Extracted, there are some rules of XNF that say that when Inline is the XNF, and when Extracted is. I don't know what they are (and would like someone to at least take a stab at it - I think that will make for interesting and educational discussion)

Once Martin's book is out, we can make a list: #33 is XNF, #34 is not required, etc. --AlistairCockburn

RonJeffries wrote:

XNF describes the state of the software, not the process that got it there.

Martin's forthcoming book will help folks learn part of the process for getting to XNF (though he doesn't explicitly speak of XNF). He also compares many refactorings as in "I would typically use X here, but if such and such were the case I might go with Y and refactor using X later." I agree that such a discussion here would be interesting (see SampleRefactoringQuestions?). --JoshuaKerievsky

The thing about refactoring is that it is pretty agnostic. Many refactorings are natural inverses of each other.. PullUpMethod? <-> PushDownMethod?. It seems you use them in the service of other goals. For instance, if you want to keep all your classes small, you may end up using both PUM and PDM to get there. The refactorings are the tools.

I suppose that ENF could be defined quasi-mathematically: "the mean of the number of methods on classes shall be between 4 and 9, the standard deviation shall not exceed X.." Do you all really want to do that? It can be done, but there are two problems. (1) quantifying it could make it a false goal; people could freak if they are not at measured ENF even if there are valid reasons for a bit of displacement. The real issue is how well the code base adapts to change and how happy you can keep your customers. (2) Some of it isn't measurable. One item: is all the code as simple as it could be? You can only tell that by asking everyone whether they've seen anything that can be made simpler.

-- MichaelFeathers

The CodeSmells tell you that your code isn't in XNF. Refactor until there are no more smells. Then you are in XNF.

I experimented with this with Profile/V (I did one-person experiments of everything in XP), a performance profiler I sold for Smalltalk/V. We had worked on it for a couple of years without much clean up. As part of my personal conversion to XP, I decided to clean it up. I refactored for a solid week. At the end of that time, I couldn't think of anything else I wanted to do to my code. That's XNF.

This implies that XNF is subjective. I go lots of places where they can't imagine what else to do with their code and I tell them 100 things. However, I suppose they don't know about the CodeSmells. --KentBeck

I hold that XNF is (today) subjective, although I believe that there is some kind of mathematical function one is trying to minimize. We need a code metrics person to think about this.

My hope in challenging others to define this is (a) to help folks see that they understand more than they realize, and (b) to leverage those ideas to better overall understanding. --RonJeffries

I suspect some of the non-XP-philes understand and realize much more than the above gives them credit for (which might make them less than eager to accept your challenge).

Something really doesn't sound right here. I like the Aikido analogy fine ... but I wonder whether it's the process, rather than the code, that achieves a normal form. Look at that poised Aikido master, ready and in perfect balance. He's a process, isn't he? Carve a statue of him and it might exemplify the stance, but it will tip ...

attempted exploration follows ...

will a statue of a poised Aikido master really tip?

the years of practice may bring an Aikido student to mastery, but the mastery is not the years. just the same, does not the process bring the code to XNF, but the XNF is not the process?

What could this mean to a YAGNIfied, OAOOish, RegressionTestful? piece of code? Well, if my requirements change, how do I know I need this code? Or if I get two different chunks of code, how do I know it's still OAOO? And if I get someone to modify it, how do I know their tests are adequate? Without an XNF process then XNF code is worth no more or less than any other code. It will tip ...

consider this, grasshopper: we are to be handed one of two pieces of code to enhance. one is XNF, one is not. which one do we want to be handed if we need to complete the enhancement in shortest time? if we say "the XNF" we accept that XNF is a state, not a process. if we say "the other", people will bet against us and win.

if our requirements change, don't our tests change? don't our tests tell us what we need?

isn't OAOO pretty factual? isn't there either duplication or not?

Conversely, I may be called on to work on pre-existing code. Say I'm supposed to use a bunch of Microsoft components. Well, there's no YAGNI in MicrosoftCorporation. There's no OAOO. Regression tests? Well, maybe MS have got some, but they don't share 'em. Still the components look well factored and solid; I can trust them to do the work I need from them. What does it matter that they're not YAGNI/OAOO/Testful? Shouldn't I just get my process right and let MS mind their own knitting?

is the hypothesized XNF not a measure of the maintainability of the code? if defined, will the definition not tell us whether MS code is XNF or not? if it is not, does there not exist a transformation to make it XNF?

If all that's true, perhaps there really is no such thing as ExtremeNormalForm? --PeterMerel

Well, my take on all this is that the essence of ExtremeNormalForm is the requirement for OnceAndOnlyOnce. However, OnceAndOnlyOnce is not an absolute state. It can have a numerical measurability ; the granularity of the OnceAndOnlyOnce. I suggest that certain levels of granularity are more suitable for maintenance than others.

As an example consider the following "code" fragment:

     define printHTMLHeader (title)
     {
      print "<html>"
      print "<head>"
      print "<title>"
      print title
      print "</title>"
      print "</head>"
     }

This seems fairly OAOO, but is it completely OAOO, and if not, by how much?

By simple text analysis I can see several duplicates. The "print" command appears 6 times; so refactor:

     define printHTMLHeader (title)
     {
      print "<html>" "<head>" "<title>" title "</title>" "</head>"
     }

The < and > characters appear 5 times each; so refactor:

     define tag (name)
     {
      return "<" name ">"
     }

     define printHTMLHeader (title)
     {
      print tag("html") tag("head") tag("title") title tag("/title") tag("/head")
     }

The "/" character appears twice; so refactor:

     define tag (name)
     {
      return "<" name ">"
     }

     define endtag (name)
     {
      return tag("/" name)
     }

     define printHTMLHeader (title)
     {
      print tag("html") tag("head") tag("title") title endtag("title") endtag("head")
     }

The words "head" and "title" appear twice each; so refactor:

     define tag (name)
     {
      return "<" name ">"
     }

     define endtag (name)
     {
      return tag("/" name)
     }

     define tagpair(name, body)
     {
      return tag(name) body endtag(name)
     }

     define printHTMLHeader (title)
     {
      print tag("html") tagpair("head", tagpair("title", title))
     }

I think that's as far as I can go in this "little language", but the language constructs ("define", "{", "}", "return" etc.) still appear more than once. The granularity of which I spoke earlier manifests in the size of the methods and the depth of the call stack. At the start, the granularity was that of "printHTMLHeader", six statements and one level deep. We can assume that the rest of the system never prints HTML header info without going through this function, so that is OAOO from the perspective of the rest of the system. By the end of the process the method is one statement, but five levels deep.

To my mind this tradeoff (method size and method depth) give a possible measure of OnceAndOnlyOnce. The extra dimension which leads to XNF is the requirement to minimise both of the variables. A system with a measured lower (average, total, typical?) size and depth is more ExtremeNormalForm.

Corollaries: 1. XNF is language dependent. XNF for C or Smalltalk is different to XNF for assembler. 2. In real world languages there is always a size/performance overhead. The final example above is bigger and would probably run more slowly than the first.

-- FrankCarver

Actually, I wouldn't see anything wrong with the first set of statements, since the duplications are trivial. My rule is to ExtractMethod for duplicated groups of contiguous statements. If you try to apply OnceAndOnlyOnce to method invocations, what do you replace them with? I have extracted single lines of code to a method, but for the goal of clarity. To me, OnceAndOnlyOnce is more about bodies of code.

AAh, but that's my point! No one really wants absolute OAOO, even though some around here seem to imply that. What I was getting at is that there is a notion of "level" or "granularity" to this approach, and where you stop says something important about your development process and priorities. I certainly wasn't holding up my final position as an ideal refactoring. -- FrankCarver

For your point 2, I think it is kind of cool to note that at optimization time, it is pretty easy to start inlining code again if you really need speed. But frankly, for me those are optimizations of last resort.

-- MichaelFeathers

OnceAndOnlyOnce [as used in XP] is about duplication of code. We wouldn't use it to try to consolidate all the e's from a program.

'' I think you must be setting yourself up here. Ever since Turing it's been possible to argue that code and data are the same thing . Anything that a programmer types should be subject to OAOO. Otherwise all the problems of typing errors and multiple inconsistent versions will come back to haunt you. -- FrankCarver ''

The following sort of begs the question. However, OnceAndOnlyOnce is a code-simplifying tool, so I'd say that one doesn't do a transformation that makes the code more complex. Clearly, to any reasonable eye, some of the transformations above make the code more complex. I wouldn't do them. Again, I admit that it begs the question.

Note that use of constant definitions ala C, C++, Java might have made the code slightly simpler. For my taste, it was fine as it stood. I still say there's no point in trying to replace all the E's with a pointer to an E.

In the example given, the <'s and /'s aren't code. OTOH, if we had, also

     define printHTMLMessage ( message )
     {
      print "<html>"
      print "<head>"
      print "<message>"
      print message
      print "</message>"
      print "</head>"
     }

I might refactor until I could say

     define printHTMLMessage ( message )
     {
      printHeader
      printNamedString ( 'message', message )
      printTrailer
     }

and

     define printHTMLHeader ( title )
     {
      printHeader
      printNamedString ( 'title', title )
      printTrailer
     }

Now ... how can we clearly express this distinction, so BertrandMeyer won't get mad at us?

Of course there are other ways to factor this example. What I was offering was some attempt at a numerical metric for the amount of OAOO, and by inference the nearness to XNF, of a given body of code. --FrankCarver

Donald Knuth talks about this in The Art of Computer Programming, Volume 1 (page 187, Third Edition). His point is that replacing a portion of code with a subroutine invocation is not free; one must define the new subroutine and insert the proper subroutine invocation codes. The example above demonstrates that adding subroutines indiscriminately can greatly expand your program. (InlineMethod is also a RefactoringPattern, remember.)

Knuth actually discusses the mathematical aspects of rewriting as a subroutine, and derives a rough equation for its cost when the language is MIXAL.

To my mind, the question of determining whether or not the code contains duplications (particularly CopyAndPasteProgramming) is of very practical concern, and I have recently been prototyping utilities which attempt to find such instances. One of the more useful ones ranked duplicated strings by the product of their frequency and their length; this rank represented the number of characters consumed by all instances, and was thus an upper bound on how many characters could be removed by refactoring that particular string. A variation which was highly successful used the product of the frequency with the square of the length (reflecting a heightened interest in longer strings).

A truly useful version of this scheme would use an actual parser for the language in question and attempt to find duplicates or similar patterns in the parse tree rather than the raw characters of the input. Ideally it should notice larger-scale structure as well, such as if "foo.bar(some expression)" is always followed by ".baz()".

See AutomatedRefactoring.

-- JamesWilson

After some re-reading, and getting back into the game, so to speak, I have this to offer on ENF, OAOO, and related topics:

If I have an AtomicUnit of code (say the sequence of print statements), it should not be refactored. Of course, I leave the definition of AtomicUnit to that page, but I see this as a singal task in the block. OnceAndOnlyOnce would be represented by removing logic from a NonAtomicUnit that exactly mirrors the purpose of the AtomicUnit. the presence of suitable AtomicUnits in code should be one measure of both OAOO and ENF. WyattMatthews.

The code is OnceAndOnlyOnceNormalForm? when it has reached zero CodeChangeResistance.

I.e. the point of the normalization (rather than reaching some OAOO metric) is to minimize the amount of update inconsistencies arising when exercising maintenance or extending the code. Factorizations that do not change the CodeChangeResistance are over-normalisations and should, IMO, be considered a violation of OAOONF. Standard refactorisations are the operations with which we move the code towards and away from OAOONF. The relational model has a very convenient small set of refactorisations - the XP set is much larger, which indicates that we'll need a much larger set of antipatterns to define OAOONF.

So, define the causes of CodeChangeResistance and we'll have OAOONF nailed down in no-time!

CategoryExtremeProgramming