Long Functions Discussion

[[[This page is TooBigToEdit. (printed it is 38 pages -- almost 100K) That seems strangely appropriate. -- MichaelSparks]]]

From LongFunctions:

EwDijkstra wrote an entire book, a masterpiece at that: "A DisciplineOfProgramming" in a language that has no procedures. He mentions that he did that on purpose. Some of the algorithms in the book are anything but trivial, as a matter of fact a lot more complex and with more substance than the toy examples you see in many books of today, yet they fit into a single function (the main program that is). This is more than enough proof that long functions are not bad. "Extremely long functions" may be bad, but long functions are good. Many times long functions improve readability and the mental effort required to figure out what the code does, also the effort required to prove it: sometimes the effort required to establish the contract between an "extracted out" function and it's original context may be significantly more than the effort to prove/understand whatever was needed in place. (See RavioliCode.)

Well, Dijkstra's examples are pretty computer-sciency, don't you think? I'd be interested to see some software engineering regime project (say, 1.5 million lines of source, fifty developers, maintained over five years) proceed satisfactorily in a language with no unit of functional abstraction, no way to implement InformationHiding.

And wasn't Dijkstra's goal with the book pedagogical? I don't have a problem with etudes written as one long sequence of code (for small values of "long" :). Long functions can be studied in the way that one might study a poem, in isolation, with copious free time, aided by reference works. But long functions tend to be hard to maintain within a commercial time-frame when the person who wrote them left the company three years ago. And are all too often too hard to maintain when you wrote them three years ago. -- AnonymousDonor

I hate to dredge this up again, but I just read "A Discipline of Programming" and have to comment:

"A Discipline of Programming" is a book, not a program. There is no code block within it that is longer than half a page. Most of the programs illustrated are in the same 10-20 line range that people advocate for methods
When a program is long, it's broken into subroutines. It's just that the subroutines are written as English prose, not function calls, and are then elaborated upon later in the text. See page 131, "An exercise attributed to R.W. Hamming", where the 'program' reads:

  "establish P0(n, q) for n = 1";
  "establish P1(q, x2, x3, x5) for the current value of q";
  do  n =/ 1000 ->
    "increase n by 1 under invariance of P0(n, q), i.e. extend q with min(x2, x3, x5);
    "re-establish P1(q, x2, x3, x5) for the new value of q"
  od

Who says the language doesn't support functions? What do you call xx:update, xx:delete, and xx:insert on page 121, "Updating a sequential file"? Just because they can't be defined doesn't mean they don't exist.
Dijkstra writes in the preface, "I owe the reader an explanation why I have kept the mini-language so small that it does not even contain procedures and recursion." The reasons he gives are:
- "First of all, this monograph has nont been written for the novice and, consequently, I expect my readers to be familiar with these concepts."
- "This book is not an introductory text on a specific programming language and the absence of these constructs and examples of their use should therefore not be interpreted as my inability or unwillingness to use them, nor as a suggestion that anyone else who can use them well should not do so."
- "I felt obliged to present repetition as a construct in its own right, as such a presentation seemed to me overdue." In other words, the book is about the use of proofs and formal methods to produce correct loops and conditionals. Functions are outside the scope of the book; he is explicitly not saying that they're bad, but he chooses not to deal with them in the manuscript.

-- JonathanTang

Can we resolve this entire discussion by saying long functions are good if you can hold them all in your head like CostinCozianu and bad if you can't like EricHodges? -- EricHodges

I don't see why not. Let [us] write their LongFunctions on occasions, and we'll certainly let you write LotsOfShortMethods all day long.

the guys who wrote the code in LongFunctionExamples can hold those functions and even more in their head.

And I think personal references are appropriate. Your argument seems to be based on the fact that you, some GrandMasterProgrammers and some open source contributors benefit by keeping all of the details of long functions in their conscious awareness at the same time. I think we can divide the set of all programmers into those who benefit from this behavior, as you do, and those who avoid this, as I do.

-- EH

There's always room for honest disagreement, and not even disagreement at that, different personal preferences. Let's avoid PersonalChoiceElevatedToMoralImperative, shall we? This implies a minimum respect for a different point of view.

Defenders of LongFunctions, including me, were very much accommodating your preferences. We don't claim that all functions or a majority of them, or a quarter of them should be long. Our only claim has always been that a function is too long only when analyzed in a certain context. There can be very good code in a function that is 100 lines long. Why do you need to have a problem with that?

[Please] somebody on LotsOfShortMethods side ... have the courage to edit and retract those abusive non-sense, and I'll retract whatever you feel is not accommodating of your point of view on LongFunctions side, or will let the disagreement stand as it is right now. Not exactly pretty.

I said:

Yes, I think long functions are bad practice. I don't believe it is abusive to say so. -- EH

It is very much abusive, because you can't defend that position - other than being a personal preference, yet when you say it is "bad practice", you don't leave room for a legitimate different point of view. I cannot agree to disagree with you, because you're position is both abusive and nonsensical, and if it raises to the status of popular dogma in SoftwareEngineering it will be harmful. So the only option left for me is to demolish that position.

I have defended my position by sharing my experience. That always leaves room for other points of view. I do not believe that you can demolish my position. -- EH

Oh, but it's elementary, my dear Watson! Your experience entitles you to draw conclusions only with regards to what's good for you, but does not give you any standing to declare what's bad for others.

Oddly enough, my position remains. I can declare what's good or bad for others just as much as you can. -- EH

So let me understand, do you EricHodges claim as a consequence that public and publicized code that we have from GrandMasterProgrammers such as DonaldKnuth, EwDijkstra, Linus Thorvalds, DanBernstein, Xavier Leroy, Abelson & Sussman and many many others has examples of software engineering bad practice, just because on occasions they have functions longer than what EricHodges or any other OO hot head feels comfortable with?

I, EricHodges, claim as a consequence that public and publicized code that we have from GrandMasterProgrammers such as DonaldKnuth, EwDijkstra, Linus Thorvalds, DanBernstein, Xavier Leroy, Abelson & Sussman and many many others has examples of software engineering bad practice, just because on occasions they have functions that are too long. Shall we notarize this? -- EH

And I, as a consequence, declare that you're full of it when you make the deduction that because you can't keep 15 lines of your code in your head at a time, you are entitled to smear others. Before you criticize some code you have to understand it, but since you're not willing to put up some effort other than shouting your prejudices in public, the only critique you can bring on such code would be "EricHodges cannot (easily) understand this". I hope it won't come the day when the EricHodges' capacity of reading code would become the standard measure for good software engineering. You may notarize this.

I have neither smeared anyone nor shouted my prejudices in public. Perhaps you have me confused with someone else. My personal experience is that I can keep more code in my head than the average programmer. The number of lines of code that DonaldKnuth or CostinCozianu (or even the mighty EricHodges) can keep in their head is not of great importance. It's the number of lines that the average programmer can easily understand that matters most. I welcome the day when the standard measure of complexity for a function falls far below the limit of my ability. -- EH

Do you want new programmers to learn TheArtOfProgramming? from studying first and foremost the code of GrandMasterProgrammers or from blindly following the rants being posted on WardsWiki?

I'd like them to learn from both, if possible. You seem to be saying that we can never improve upon the code of our elders. I disagree with that. -- EH

I seem to be saying that you won't be learning anything with this attitude of yours.

What won't I be learning? That long functions aren't a code smell? No, I don't think I'll be learning that. I just spent 23 years learning that they are a code smell. Noticing that they smell doesn't stop me from reading and understanding them (or trying to make them smaller). -- EH

So let me try to read your position clearly: if I'd show you 50 lines of code in a function, you'd qualify it as example of bad practice before even looking at it, or would you try to read it and analyse the context? If you'd say 50 lines is too much regardless of context, what would then be your threshold for longness?

If I see a function with 50 lines I'll try to find a way to make it smaller. I've written functions that had to be long, but rarely and I've never enjoyed it. My threshold for function length decreases as I age. Most of my functions are around 5-10 lines long. Today a 15 line function stood out like a sore thumb. I found no good way to shorten it, so I let it stand. For now. -- EH

The threshold of 10 lines is borderline idiotic. I could have given you a break if you said 20. But not even StructureAndInterpretationOfComputerPrograms, who has small functions because the examples are simple cannot manage to squeeze under 10 in many instances, and quite a few examples have the whopping unbelievable length of 20 lines of code!!! They haven't realized though that 20 lines of code may stand out like a sore thumb for honourable EricHodges.

Costin, the 5 to 10 line thing is an average, not a set limit. When we see something longer, we see if we can simplify it by extracting a logical chunk of code, if none exist, then we let it stand. Sometimes you end up with 20 or 30 line long functions, and that's OK, as long as it's intent is clear. You shouldn't have to know the context to know what the code is doing, if that's the case, then you're keeping something in your head that still needs to be put into the code, you aren't finished yet. Pair programming tends to fix this problem. The whole argument is about the clarity of the code, not the line count. 5 to 10 lines just turns out to be the average when you give each logical chunk of code a name. I'd rather see that name than have to remember it, or read a comment for it, and so would many others. GrandMasterProgrammers get to spend more time writing code and maintaining code than us mere mortals. It's harder for us than for them, so we like to make things a little easier with smaller functions. Appeal to authority doesn't changes the argument. Long functions are very hard to maintain and extend. Short methods are easier to maintain and extend. You have yet to come up with a valid or logical argument refuting that point. All we're saying is "strive to write shorter clearer methods, when possible". Is that so hard to swallow? Is that a lie? Is there a grand conspiracy to wipe out long methods that you're fighting? You are the only one I see here who doesn't seem to think those samples could be greatly simplified and made more clear by a little refactoring, why is that? Anonymous

If you recognize that it's harder for us than for them, maybe you should learn from their examples, so that it won't be that hard next time! But once you recognize that, quit trying to promote an absolutely useless dogma that would, as a consequence, qualify as bad practice what grandmasters do, and not only what grandmasters do, but what a lot of your fellow "mere mortal programmers" (including me) do. If you are such a mere mortal maybe it is not you who should militantly promote what you think about SoftwareEngineering as some kind of common wisdom. People should be learning from exemplary software written by GrandMasterProgrammers.

I'm eager to refactor out the conversational chaff, and leave the common sense position in place, rather than let your extremist position remain as some kind of wisdom on software engineering that is approved by the WikiCommunity.

Go ahead, refactor away, there's still some good content here, somewhere, but don't delete the opposing opinion, it isn't at all extreme.

Let me go then to a GrandMasterProgrammer that not only writes code but also undoubtedly maintains it, and not only did that for many years, but he did that quite admirably. It's DanBernstein the author of qmail (among many other things). Now you cannot claim that QmailSystem is not on par with professional software or that is not being maintained. It's running tons of emails around the world, and I've used it and installed it myself for a small company. It is a blessing compared with alternatives. Not only that, but recently RalphJohnson, who is among the smuggest SmugSmalltalkWeenie of them all, praised qmail as a SoftwareMasterpiece and he recommended: I think that qmail is an ideal system for people to study who want to be good programmers. Now go in a hurry, download the source code for qmail, and I can guarantee you'll have no problem to find functions that are 50 lines and maybe even 100 lines of code long. I did that a while ago, and I can tell you it was nothing like Smalltalk style. Do you also think that DanBernstein is applying bad practices in his code?

''Now our argument, that everybody on the other side keeps misrepresenting it into a StrawMan, was not that a long function would be easier to understand than a small function, but that

Of course a short function is easier to understand than a long function, but that has nothing to do with the current discussion. There are times when a long function is easier to write and maintain than the ensemble of short functions that would replace it.

Extract a function if doing so allows you to name the new function something that will clarify the code. In LongFunctions, these sections are often surrounded by some whitespace and preceded by a comment.

I disagree with creating a function just to make a comment. Some people think that developers give function names more care then comments, but my experience is different. Besides comments are easier on my eyes than CamelCase. Let's AgreeToDisagree on this.

Both function names and comments should be chosen carefully. If you're trying to defend yourself from careless developers, you probably have more to worry about than how long your functions are.

I strongly agree, but this is one of those touchstones that distinguishes Pragmatists from Purists. A Purist will insist that a function must never be "long" (e.g. "over 40 lines" or "over 10 lines" or something).

Purists must never have used YACC, which generates C code into a single C function of therefore unbounded length (you can factor out all actions as individual functions, but the length of the generated function is roughly proportional to the complexity of the grammar, which in turn is typically an independent variable not under our control.

Purists wouldn't apply the rule to generated code such as that of YACC. The goal of keeping functions small is to provide readability, but generated code is usually not intended to be readable. Purists apply the rule to code the people must write and maintain.

Disagree on several counts. First, I personally have heard people make unconditional claims of this sort, with no allowance for generated code. Second, "generated code" is a seamless spectrum that merges with handwritten code, so it's not clear where any such dividing line would fall, anyway. Third, the C compiler I once used that couldn't handle YACC output had functions limited to no more than 500 lines because of Purist reasoning that didn't bother to stop and think about whether there might be exceptions.

Many compilers go haywire on long functions because their graph-coloring algorithm can't handle all the temps necessary in the function. The compiler tries to allocate variables to registers, but it can't color the graph and so has to spill and stack-allocate a temp. If the function is long enough, it spends most of its time spilling and restarting the coloring.

No. That might be a problem with one compiler, but it certainly is not with "many compilers". That would be an idiotic implementation; graph-coloring is NP-hard, as with most interesting graph algorithms, so any implementor with an iota of common sense would have an auto fallback to some faster but suboptimal allocation algorithm. Besides, the number of live temporaries is unrelated to linear length of a function. And if it were a function that would be relatively trivial to break up into multiple functions, then it would be guaranteed to have most of its temporaries with non-overlapping lifetimes. Lastly, spilling is a runtime cost, not a compile time graph-coloring cost, so you sound rather confused. This graph-coloring point sounds like a wild guess.

That seems like a fault of that tool, not long functions themselves. It almost reminds of a unverified old story I heard about a computer operator who wouldn't let programmers use lower case because it made uneven wear on the printing ribbons.

Read again. It's obviously the fault of the idiots who wrote the tool, and who were obviously Purists who hadn't considered that there could be any exceptions to the rule about function length. Like I said in the first place. I don't know where you get the question of whether it's a fault of "long functions themselves"; obviously it's a fault of humans.

That's why many FunctionalProgrammingLanguage compilers that compile to C limit functions to 500 lines, and use a trampoline bounce to enter a different function. GCC just takes too long to compile the generated code otherwise. -- JonathanTang

Does the language affect your decision? For example, we have Pascal-style divisions that sometimes making splitting easier because one does not have to create long parameter lists or move local variables to global or module-level in order to split.

Perhaps some material under SwitchStatementsSmell could be moved here.

Yeah, that page badly needs refactoring. . . or it's a symptom of any SomethingSmell? page - they always end up with ThreadMess

I think the principle of "a function should do one(and only one)thing" is appropriate here. Some functions don't need to be long. A couple functions do - namely ones that can't be broken down into calling sub-concept functions, or which use too many local variables(often a CodeSmell). It's one of those rules that you learn, then break when needed.

[If there are any long function fans, how about some examples where you think it's justified. If a function only does one thing, I just don't see how it can ever get long. Long functions are inherently bad for humans and for maintenance and IMHO, are never justified in hand written code.]

Here are some examples of when it's justified(remember these are exceptions - they don't disprove the rule, just qualify it)

Entangled concepts that cannot be dis-entangled by putting into functions
Dealing with a Win32 message pump switch(it *is* smelly and the MFC solution is worse but there's not a choice either - blame microsoft)
When using inline assembly with a lot of register re-use
When having to initialize one or multiple complicated structures that get passed to a 3rd party API
A very long and complicated unit test that isn't reusable as library code

So basically when having to support someone else's (smelly)code, or when local variables are too numerous/entangled to be passed elsewhere easily.

-- LayneThomas

Entangled concepts that cannot be dis-entangled by putting into functions

Everything can be disentangled. (Closures are sometimes helpful). The question is how much state you have to pass around to the various functions you replace your large one with.

While large functions are never necessary, they may be better than the alternatives.

LongFunctions are to be avoided as a general rule, but as always, it depends on context.

I think we are in violent agreement here.

I agree that LongFunctions are a code smell, but sometimes there is no choice(must.kill.microsoft.now), or, like you said - the alternatives are worse(2000 lines of extra code just so it won't be one long function).

We should probably refactor this section into a "reasons to break the rule to avoid anti-patterns" usable concept.

LongFunctions may indicate the presence of the LongMethodSmell. ShortMethods may replace LongFunctions. (Perhaps by way of a MethodObject.)

There is a point of diminishing returns there though. Having a long function may be less complex & more understandable than using MethodObjects

The Smalltalk community and KentBeck proclaim that the golden rule is "few lines", which means 5 to 9 lines; some methods may have only one line. However, this is a regional culture that's all. For other software developers the golden rule has been "one screen", which should be more than enough for most purposes. As a reference, I think all the algorithms in Dijkstra's book fit on one printed page.

Very true. A function on the old TurboCeePlusPlus? IDE should have been around 23 lines. In Visual Studio, 50 is easy. Of course cognitive load comes into play, but it's easier to read than recall, so I believe screen size is the biggest factor.

Much of this is also related to what language you're using, and how concise it is. In some languages you can do many interesting things in 5 lines; in others, you cannot. -- francis

Well, only to some degree. The language used by Dijkstra was quite powerful and you wouldn't be writing that essentially shorter even in Scheme. On the other hand it's quite easy to see one screen full of Scheme code defined at the top level. I think it has to do with the fact that Schemers, like Dijkstra write more algorithmics, whereas the typical application domains for Smalltalk has less algorithmics.

Another thing to factor in is whether a function is long on the outside, but maybe it contains inner functions - a common case in Pascal, Scheme, ML, but less common in Smalltalk or Java.

I like your comment about algorithmics. I think it might be true, but I think it'd be fun to see an example of a longish Scheme (or Dijkstra-language) algorithm and try to factor it into meaningful ShortMethods (in Smalltalk or Scheme or whatever). -- AdamSpitz

Long methods are smelly because they hide the algorithm within the implementation in most cases, mixing two different levels of abstraction. A method should either be doing one thing, or calling other methods to do a sequence of things, but not both. I've yet to see a long method that didn't have some code.. some white space.. some more code.. some more white space. Each little chunk of code should be a separate method. This leaves the main method with just the algorithm, and all the implementation in the shorter methods. This makes understanding the main method trivial, even by someone who didn't write it, like a maintenance programmer. If the short methods need to share variables, then it's an object, and those are class variables. I'd still like to see someone post a justified long method, I don't believe they exist, at least not in modern languages, i.e. not assembly. -- RamonLeon

For examples, see LongFunctionExamples, or read "A Discipline of Programming".

Re: Functions are hard to read if one has to scroll back and forth through multiple pages to understand them.

However, pressing Page-down is often easier than finding the various names of the split-up parts. For example, if there are case statements or a series of IF statements, then the alternative to paging down is to find each of the various matching named-units. At least for me, finding matching names is much more costly than pressing Page-Down. Of course if a particular block is rather long, then I might consider making it a dedicated function.

If I have a choice between pressing page down, and looking up a function, I'd rather go with reading a good function name (one that just tells me what the function does) and assuming that the function does what it says it does. It's much much faster reading code that way. -- AlexAusch?

That's assuming that the function name can communicate the contract, and that the impact of lots of names with lots of parameters or auxiliary objects and data structures for petty nothings won't have a tiring effect on you, and assuming that the refactoring exercise to go down to under 10 lines of code has a net positive effect. The typical exercise for reading the code is to prove that a certain algorithm functions correctly, or else find out where the bug is. Trusting names ain't gonna cut it, so the typical exercise involves jumping back and forth between functions. Some of the RavioliCode also implies that you can't deduct the call graph directly, because it's sometimes hidden between Observable/Observer, Subcribe/Notify which may on occasions introduce unnecessary abstractions.

We will just have to AgreeToDisagree on this. I know what bothers me and what does not bother me. But it apparently doesn't fit your bother profile(s). If Page-Down is costly to your fingers/mind then so be it. People are just different.

It has nothing to do with page down. It's about abstraction and SeparationOfConcerns. People who believe in short methods believe in having functions do one thing, and the function name says that. It allows us to separate the logic (algorithm) from the implementation of that logic, it's called abstraction, and it's what programming is all about. Real code, production code, should have a proper separation of concerns and levels of abstraction. I guess we will just AgreeToDisagree. You seem to like to do your work "in" functions, I like to do work "with" functions.

{The problem is that in the lower levels of procedural task refinement, the distinction between interface and implementation gets blurry. FuzzyDistinctionBetweenInterfaceAndImplementation. At least it matters less and less at the lower level. Worrying too much about it before there is an actual need, such as satisfying OnceAndOnlyOnce, may be PrematureAbstraction.}

Regardless of this handwaving that tries to pass as software engineering principle the truth is that an overwhelming majority of open source software out there that can be recommended as a model of good programming have plenty of what some folks around here would sneeze at as LongFunctions, and this can be seen in languages varying from C to ML to Scheme and Prolog. Such a diversity of GrandMasterProgrammer, writing LongFunctions in all kinds of languages, in all kinds of projects, is an overwhelming evidence that refactoring out towards LotsOfShortMethods, is not necessary. You write short methods when it makes sense, you write longer methods when it also make sense.

[As someone who used to contribute many long functions to open source projects, I don't find this argument convincing. I wouldn't recommend any of the long function examples provided as models of good programming. Finding long functions in open source projects is not a persuasive argument against writing short methods. I used to write long methods. Now I write short methods. My life is easier as a result. -- EricHodges]

Excuse me, Eric, but I'm not familiar enough with your credentials to be able to weigh your personal experience "my life is easier as a result", against the examples of GrandMasterProgrammers like Dijkstra, Knuth, and many others, or even against less notorious programmers who wrote the examples in LongFunctionExamples, but for whom, the quality of those projects speaks more eloquently than any rant on wiki. Are you guys willing to make a coherent logical argumentation, or it's just my 2c versus your 2c, versus Dijkstra's 2c. Your sneezing at the code examples in LongFunctionExamples doesn't make any sense either. -- CostinCozianu

[Credentials won't maintain my code. I'm offering my experience. Do with it what you will. -- EH]

Personal experience is part of how we discuss things here, as per Ward:

: This site remains dedicated to capturing and examining the real experience of expert developers and succeeds to the degree that it gets expert developers to report their first hand experience. (Originally on SearchForTruth)

Costin, certainly you have just as much programming experience as many of the other posters here. Seems that programming is easier for you if you're more flexible about function length, which I can personally believe but I don't have a clear grasp as to why. What do you suppose you're doing differently than Eric and myself? -- francis

... long functions are perfectly OK - if they fit the context. ... There's an absolutely misguided prejudice against long functions in some rather small part of programming community, a prejudice that has no support in facts and logical arguments, or real life examples and practices from grandmaster programmers, open source projects, etc. At best we are talking about anecdotal evidence based on preferences of one or the other.

That's untrue. It's simply a fact that shorter methods are easier to maintain than longer ones, no argument can be made against that simple logical fact. That doesn't mean long methods aren't used, or don't work, it just means they are more difficult to maintain. Any long method, can be written more clearly by breaking it up into several shorter methods, each dedicated to a specific task. You don't need to look at the shorter methods, or hop around, they do what they say they do, you wrote them that way. Methods allow you to chunk the code and work at higher levels of abstraction, that's what they're for. Just because GrandMasterProgrammers write long methods, doesn't mean they are suddenly easier to maintain, you can only remember SevenPlusOrMinusTwo things, short methods are easier to maintain, period.

That each individual shorter method is easier to maintain is undeniable, but that the whole ensemble is easier to maintain when you have tons of unnecessary names, is something I'm not recognizing as a fact. There's one point of diminishing returns and negative returns and that's easily provable. Otherwise we'd be writing all 1 million one liner methods for 1 million lines project, and that would clearly be insane. Now you'll jump and tell me that you know that that tip-off point is on average 5-9 lines of code. That may be true for Smalltalk but not for C or OCAML, may be true for you but not for me, may be true for that algorithm, but not the other algorithm.

Editor Power?

This feels to me like a simple tools problem. Pressing pagedown is one button. What if there was a different button you could press that would take you to the definition of the function being called? Or what if there was a button that would show you the contents of the called function inlined into the current function?

We're programmers, guys. The programming environment can do anything we want. ;)

-- AdamSpitz

Agreed. The best programming environments do this. Eclipse, for example, can take you to the definition of any identifier in the source. The cscope program can generate a cross-referenced view of the code that allows you to navigate similarly. "It's a poor craftsman that blames his tools."

Fancy function keys don't work on printouts. For some of us at least, paper is still easier on the eyes and easier to mark up with pencil then computer screens. Plus, it is still hard to compete with the simplicity of the page-down key. For one, you don't have to first aim the cursor on a given function name.

[I can only think of two reasons that you can't choose an editor. The first is that you are using a bad toolset with an inflexible built in editor. Change tools. The short term pain will be worth the long term gain. The second reason is managerial fiat. In this case, run, don't walk, to a new job if at all possible. Your management is brain-damaged. Any decent programmers editor will be (relatively) easily extended to deal with any language you run into effectively. If it can't, it isn't a programming editor.]

1999 is dead. We don't have a lot of job choices these days. [I simply can't believe that most management is that stupid. If that were true, we would be in a state of total economic collapse.]

Well, many managers are that stupid. On the other hand, the stupidity of others too often provides a convenient excuse for not trying to improve your own situation ... -- francis

{Stupidity won't necessarily cause "total economic collapse" if your competitor's managers are also stupid. Some companies just have weird policies because of the whim of a head honcho. That is the way biz is.}

What do you want to optimize? One study found that while 'small' routines had 23% more errors than 'large' routines, but were 2.4 times less expensive to fix (they defined the boundary between small and large routines rather arbitrarily as 143 statements). (Richard Selby and Victor Basili, "Analyzing Error-Prone System Structure").

Not a fact. Reconstituting the complete logic path in your mind is very error prone. With every function call you are assuming it is doing what is saying, or assume you know what it is doing, which is often incorrect. Seeing the code is as real as it gets. Any change in any function may invalidate one of the paths, but you won't know that from the function name.

Show me then

  // work-day example
  void GoToWork?(){
    WakeUp?();
    Shower();
    Shave();
    GetDressed?();
    LeaveHouse?();
    DriveCarToOffice?();
    ClockIn?();
  }

Now explain to me how seeing the implementations of all those functions inline helps to clarify it's logic? Short methods make the logic explicit, I don't have to do it in my head, I can simply read it. Long ones are the ones that force you to do it in your head. If you're making the assumption that the methods don't do what they say, then you stop and debug it, but until then, you step over each one, not into it. You should assume a methods does what it says until you have reason to assume otherwise. Assuming 7 lines per method, this would be a 49 line method if inlined, there's simply no way that is better that this short simple clear 7 liner.

Next do the same for the apache C function in LongFunctionExamples.

And you're implying what, that real code can't be structured this way? I'll grant you this, it is more difficult to do in procedural languages, they lack the object abstraction that makes short methods so easy to do. But short methods tend to be a sign of OO code, not procedural.

That it is C has nothing to do with it. Some algorithms are tightly bound such that factoring them out removes the linkages. You should be able to perform your magic on the C code. Just use functions and pass a struct or something as args.

[And functional, even more so.]

Agreed. I wish more mainstream languages were functional.

One reason why it is tough to break functions up like the "work-day" example is that often there are rather complex conditionals. Whether those conditionals are high-level or low-level is hard to tell and may just be arbitrary (see above about fuzzy boundary between low-level and high-level).

Learn to use GuardClauses

Second, it is kind of hard to share variables common to all of those without passing a bunch of parameters. Lots of parameters is a code-smell in itself IMO.

Use objects or structs, don't pass individual parameters.

Using the work-day example, if one wakes up sick or can't find the car keys, then some conditionals are going to have to tell later functions not to execute.

Use guard clauses to exit early on that condition.

Pascal made it simpler by having nested functions, but most procedural languages require module-level variables to deal with this. In other words, it is easier/cleaner to make smaller functions in Pascal-like languages. Making smaller functions in other languages often requires either too many module-level variables, or giant parameter lists.

No it doesn't, objects, structs, or closures, problem solved, next...

Also, if one makes comments stand-out well, then a function can read like a newspaper where the headlines stand out (see HeadlinesTechnique):

An object's instance variables have a lifetime longer than function invocation while not being global. I believe such semi-globals to be the key enabler of short functions, not object orientation. Pascal had quite different semi-globals but still favored short functions. Pascal functions would be declared in the context of a parent function. The parent's formal parameters and local variables appeared as globals to the inner function. This promoted a TemplateMethod style where every function would call a series of helper functions that would be defined within its scope. It was within the context of highly factored pascal code that I first heard the need for comments criticized as what we now refer to as a code smell. -- WardCunningham

Yes. And this is the same reason that closures can help untie knotty functions, without polluting your global namespace or requiring a lot of state to be passed around. Pascal didn't have general closures, but subfunctions closed over there parents lexical scope, by the sounds of it. With true closures, there is no need for a 'special' enclosing function, although that doesn't mean it is necessarily a bad idea.

Yet still with closures and functional features, just like the OCAML example in LongFunctionExamples, the top level function definition still looks awfully long by Smalltalk standards, but of course it is more than GoodEnough for the programming culture related to OCAML. -- Costin

[That's OK, when nested functions are supported, then the inner functions count as small methods, if a big function is made from 5 or 6 inner functions and spans 50 lines, then that meets the small method standard quite easily.]

I'm reminded of the WeinbergTestForLongFunctions.

If you don't accept that you have to learn from those who know more than you do, you won't be learning anything. That's a very basic principle of learning and it applies everywhere. In chess novices learn from the examples of grandmasters, in mathematics one learns from the examples of great mathematicians, in programming one should learn from the examples of great programmers.

If you don't want to learn, fine. But even then don't be handwaving your hands with "personal experience" arguments around wiki, without expecting to be put in your place. Why one should learn from personal experience of some EricHodges (or CostinCozianu for that matter) of which we can only have some fairy tales, whereas countless practical examples from grandmasters are available for folks to study and draw conclusions for themselves?

Grandmasters disagree with each other quite often you know; you can't just say everything a grandmaster does is perfect. Even grandmasters sometimes write less than perfect code. But that doesn't mean we can't strive for better, clearer code.

[I'm no grandmaster, but I am an experienced programmer. I can't think of any good reason someone shouldn't learn from me or you. Just because grandmaster programmers wrote long functions doesn't mean long functions don't smell. If everyone did just what their predecessors did, there'd be nothing to learn. -- EricHodges]

The public code we can study, written by acknowledged grandmasters, almost unanimously includes occasional long functions. Sometimes every function in one of these works is a "long function". That public code is an objective reality that you can analyse and draw some plausible and relevant conclusions from it. Or you can accuse me of ArgumentFromAuthority or find an absolutely risible explanation that "Even grandmasters sometimes write less than perfect code". Your analysis stretches credibility, I'm sure any reasonable man can find better explanations if he didn't look at the data for some pre-determined conclusions. In other words your attitude can be summarized as: "let no facts interfere with my strongly held convictions".

[My conclusion after analyzing grandmasters' long functions is that they don't benefit from being long. Either the grandmaster who wrote them was incapable of or uninterested in communicating with average programmers, or they wrote the functions before that was a valid concern. No matter how good they were, they were writing in the past. The future can always be better. -- EricHodges]

ExtractMethod here and there clarifies the intent of these code samples.

The Java sample, for example, contains two for loops, but why, well, you just have to guess, or read all the code and figure it out. Does the first loop collect valid rules? Let's assume it does for arguments sake, wouldn't a method called CollectValidRules?() be much cleaner to read and remove any doubt as to what the loop does. Each of those loops would be much clearer in separate methods, and would make the main method easier to grok. That's only the beginning, but should suffice to prove the point. The method obviously executes a series of steps to accomplish it's goal. It'd be far easier to grok if the main method only contained the series of steps as calls to other methods, and the other methods each accomplished one of those steps. A method should do one thing, or make a series of calls to other methods.

Speaking of the "One thing criteria", that method you reference performs exactly one thing, it constructs the object (it is a constructor). That it does it in several steps is obvious, lots of methods are done in several steps. At which point do you take a step from the context and extract its own method? When it is longer than 5 lines or what? At the point where EricHodges cannot comprehend it anymore?

[Yes, we give each step a name and use that name as a method. Then reading the constructor reveals only the steps, not the details of each step. We look for behavioral distinctions between the steps. Here are some that jump out at me in the Kawa Java example:]

validateNumberOfRules createRules initializeRules initializeMacro

[Each of those decomposes into further steps. When I need to change how a macro is initialized I'd much rather see that list of steps in the constructor. -- EH]

You guys are analyzing LongFunctionExamples through the unreasonable standard that you have to figure out from 10000 feet high what the code does without trying to put any effort into it. Presumably some name of the functions should make it clear to you what the algorithm is. I'm afraid that is an entirely unrealistic expectation for non-trivial algorithms. Oh yes, we don't all write steps like { WakeUp?(); Shower(); Shave(); GetDressed?(); LeaveHouse?(); }. Why do GrandMasterProgrammers need to write code that would be pleasing the wiki crowd? You have your toy-ish examples anyways, you know where you can find them.

[Only it isn't an unreasonable standard. I've been doing it for years and it's quite reasonable. -- EH]

Well, apparently you haven't been doing it for complex enough projects, that you think it should be a reasonable standard. It may be reasonable for your kind of projects but not for other projects. Hey ComputerScience is difficult and not all algorithms are the kind of toys you read in XP books. Effort is required, because some problems are just essentially complex. There's no refactoring or XP magic that you can wave around and make programming of complex projects accessible to the masses of programmers with not enough mathematical skills nor CS education.

[Grandmaster programmers should write code that is clear and concise because we are other programmers. Writing code the compiler can understand is easy. Writing code other programmers can understand is hard. Every piece of code should be as easy to understand as a "toy-ish" example if at all possible. -- EH]

For every code written by a GrandMasterProgrammer you will find a idiot programmer that'll have trouble understanding it. The understanding that comes from reading a list of names like { validateNumberOfRules; createRules; initializeRules; initializeMacro } is but a superficial one, and this is not, nor should it be the kind of understanding one should look for in reading code (either for maintaining it or just for learning from it). You can bring a grandmaster programmer down to the level of average programmers, or you can ask the average dude to raise his level (just a tiny little bit), or to, well, remain where he is now.

The bottom line is that you refactored my code GridBagLayout and made it significantly harder to understand for me (not a grandmaster but the original writer!!!) and harder to verify its correctness. So here you have it: what works for Eric does not work for Costin.

[The people you call idiots are the people we work with. They are the people who co-author, modify and maintain our code. There's no good reason to "push ourselves harder" than we have to. We should make reading the code as easy as possible because it has proven to pay off in the long run. Reading the list of steps I provided would help me a great deal. It narrows the scope. When I need to change the code that initializes rules, I'd much rather read 4 lines of code and browse into a method with that name than read 100 lines of code and build the abstraction in my head. Once I've figured out that a block of code is just initializing rules it's my duty to future programmers to save them from repeating that effort. -- EH]

I don't have any sympathy for people who think that reading 50 lines of code is "pushing ourselves harder than we have to". If you want to help a future maintainer you could as well, right a simple comment in the header of the constructor. You could also write up a brief paper on the algorithm and the design, so that one who first reads the paper has no problem understanding the code in the constructor. Otherwise suppose you "refactor" a constructor into other methods, next maintainer will think he wants some flexibility and will make one of the method virtual, and the next maintainer will override it in a subclass and disaster ensues. This is just an example of creating methods that do not stand up on their own, but rely on call order dependencies and other unwanted properties that you just add to the code because you don't feel comfortable reading 50 lines of code. There's no such thing as a free lunch.

[[Eric said (on RavioliCode) "I love RavioliCode. I don't care about grokking the whole thing if it works...Long ago I gave up on understanding everything and got used to understanding just enough to get by.", a sentiment I understand, however Eric does not address the times when you do need to understand everything, whereas Constin seems concerned about this. So you two are talking about different situations, so of course you have different conclusions.]]

[[Ideally Eric should address not just the times when understanding 1 line methods is enough, but also the times you need to understand all of the methods, in which case multiplying their number unnecessarily can hinder understanding.]]

[When I need to understand all of the methods, I understand all of the methods. Needing to do so is in itself a smell.]

[[That comment is beneath you. Project requirements aren't smells, they are what we need to do. A common example is when I rescue old code that was based on poor foundations, and it needs to be rewritten. In an ideal world I'd start over again by interviewing to find use cases etc etc. In the real world I am pointed at old code and told to go wild but don't bother anyone. So I need to go in and understand what it was attempting to do. Don't tell me what I do and don't need to do.]]

[Ideally a method should be read, understood and replaced with a concise mental abstraction, upon which other concise abstractions can be built. Understanding ravioli code is not much different from understanding long functions. My macula can only be focused on a few letters at a time. Even long words require scanning and building an abstraction in my brain. The big difference between ravioli code and long functions is the ravioli code gives me predefined abstractions (method and object names) that I can use while building my mental model of the code. The effort required to build those abstractions is conserved. -- EH]

[I believe everything I write is an opinion. I believe I've been backing it up by explaining it. Is there something in particular you're looking for? -- EH]

[[Yes, this:]]

[[Are your multiplied interfaces truly easier to understand than the original single method they replaced? Always? How do you know that's always the case, as opposed to only sometimes being the case?]]

[I know that it has always been easier for me. What else can anyone know? What would the evidence you're looking for look like? -- EH]

You can know lots of things other than your experience. Others did not necessarily agreed with your conclusions every time, everywhere. -- Costin

[I haven't rushed to these judgements. It's taken me 23 years to reach them. They are founded on years of experience. I will not ignore my personal experience in deference to grandmaster programmers. -- EH]

What works for others may not work for you., and What works for you may not work for others. Please accept the validity of other's people experiences. The overwhelming evidence shows there is more than one good way to develop software.

One can preach XP all day long, as long as it is not accusing others of bad practices for not following XP.

[One can't say "long functions smell" without accusing other of bad practices. -- EH]

If XP is a collection of good practices, which I believe it is, then it makes sense that the opposite of one of those good practices can rightfully be called bad practice. There is such a thing as bad practice you know, and it is worth discussing those things. A large group of people, namely XP'ers, consider long functions bad practice, and it isn't insulting to say so. Do you not consider anything bad practice?

Eric draws a distinction between interface and implementation. While there may be times one needs to understand the entire interface at once, there aren't any times when one needs to understand the entire implementation at once. It's not that we can't understand 50 lines of code, it's that it's quicker to understand when it's properly factored, and that increases the productivity of everyone involved, both now and in the future. -- RamonLeon

[[Let's pick just the "entire interface" issues for starters. What about the times you need to understand the entire interface? Eric's hard and fast rule multiplies the number of interfaces, making it harder to understand in its entirety -- never mind implementation for a second.]]

Is the code currently on GridBagLayout your original code, or has it been changed?

Yes it is my original code. Eric's brief refactoring (just one aspect of it) is in PolymorphicGridLayoutEx.

[[A little deeper analysis would really help on this topic. Is the conclusion simply that shorter is better? ]]

The "extreme" state might not be anything to do with length, but simplicity of expression; AlanFrancis had an interesting write-up of this idea over here: http://www.twelve71.com/cgi-bin/wiki.cgi?SequenceSelectionIteration. Any given method or function would end up as "pure" sequence, iteration or alternation. This would make the WeinbergTestForLongFunctions a snap.

[[That's interesting, if you do continue assuming that shorter is better (you've pointed towards a metric for shorter, not "simplicity" versus "shortness")]]

[[A little deeper analysis would really help on this topic. Is the conclusion simply that shorter is better? Really? All the way down to zero length? No? Well, what's the counter-factor, then?]]

[[The answer to that should be inspired by the well-known fact that base e (you know, e = 2.71828...) is the most theoretically efficient number base, due to a similar tradeoff between opposing pressures.]]

[I know that it has always been easier for me. What else can anyone know? What would the evidence you're looking for look like? -- EH]

I can imagine several items worth study.

give X minutes to study the code written in the two styles. Ask questions about the code.
ask for changes to the code. Test for correctness.
look at the performance of the two approaches.

Any others?

[Those are tests the interested parties could apply, but they aren't evidence. The number of variables is too big to convince anyone here. -- EH]

They are evidence that it makes a difference. Each individual still must decide what they value.

[And I encourage everyone to gather all the evidence they need. The anonymous challenger above was asking me to provide evidence here. -- EH]

To ignore the flame war breaking out above: no one has answered some questions that I think are quite critical on this topic.

One key question: how short is short enough? Zero length? If you think that's an absurd question, why do you think so? How short is short enough????

Short enough (and clear enough) so that an average programmer (neither a grandmaster nor an idiot) can understand what it does at a glance. When my methods are longer than 10 lines they start to smell. I've given this answer several times. -- EH

[It has been answered several times. Short enough to be simple, generally meaning each methods accomplishes one task. It could be one line, it could be 30, depending on the task, but generally it ends up being about 7 to 12 or so. If you have two loops, then that's two things, break them apart, methods are better than whitespace or comments for communicating intent and scoping separate things. The link above, http://www.twelve71.com/cgi-bin/wiki.cgi?SequenceSelectionIteration gives a pretty good explanation with samples.]

[[No, it hasn't been answered, because you guys aren't saying what the opposing force is that prevents the ideal length from being zero. "It all depends" conflicts with "long functions should always be factored".]]

The opposing force is that a method has to do something. -- EH

But you haven't thought this through. If that's the opposing force, then a method that does "a*b + c*d" is still too big, because clearly it can be broken down further into several more methods that still do something.

Is "a*b + c*d" short enough and clear enough that an average programmer can understand what it does at a glance? If so, it is short enough. -- EH

[[Then rather than the absolute rule that "long methods smell", we have the relative statement that "methods should be short enough and clear enough that an average programmer can understand what it does at a glance."]]

[[And in some cases that may be true of a method that is, say, 100 lines long. There is no absolute threshold.]]

I'd love to see a 100 line method that an average programmer (or even a grandmaster programmer) could understand at a glance. -- EH

[[The rule that it should be broken down into mutually exclusive sets of SequenceSelectionIteration? I thought was being offered as an idea to toss around, not as an absolute rule that someone is offering. Or did I misunderstand? Is that the absolute rule? You all agree on that?]]

Apparently the answer is "no" from EH.

Who said anything about absolute rules? -- EH

You did, and Costin explained quite well why that was the effect of your words.

Nope. My words are my opinions and my judgements only. They are not absolute. -- EH

[[If so then the obvious next question is why selection instead of polymorphism? SelectionSmells? after all.]]

Anyone can look at any long method and see step 1, step 2, step 3, this is usually set off by a comment or white space, just break these up into methods.

[[I just think that there are larger opposing forces at work than you guys who just want to say it's merely as simple as "long methods smell" and leave it at that.]]

[[I'm not being willfully difficult, what I'm doing is attempting reductio ad absurdum on the point of view that things are as simple as you are claiming. Note that I haven't contradicted you, because I think that part of the time your approach is correct. What I'm trying to do is to get you to see that part of the time your approach is incorrect. Which I would guess is also Costin's point, too, he's just irritated and frustrated. Which is understandable, IMHO.]]

OK, when is the approach ever incorrect? I'm completely open minded to accepting that there are situations where long methods are necessary, especially in languages like C that lack certain abstractions or when optimizing and the need arises to inline for efficiency, but I have yet to ever see anyone actually demonstrate a necessity for a long method outside of those special circumstances. No sample has been show that couldn't be simplified by breaking it up into smaller clearer methods. Note, showing long samples is meaningless, unless you can show why it can't or shouldn't be broken up for clarity, excluding optimizations of course. I've yet to see a scenario where short methods shouldn't be striven for.

Showing LongFunctionExamples is not meaningless at all, it goes to show that that's how in certain circumstances the initial developer felt comfortable to do, and those are GrandMasterProgrammers in many cases. And that the results are quite good. ''

So should the functions be broken up for what? For the understanding of EricHodges? That's hardly ever a valid criteria. Yep, it might be broken up so even Eric can be comfortable with, however that doesn't come for free. Trying to please Eric, or the average programmer for that matter, is not a valid target in SoftwareEngineering. If it costs less to ramp up a project by taking some exceptions to LotsOfShortMethods fundamentalism, then that's how it should be done. If the GrandMasterProgrammer would arbitrarily target the 5-9 lines of code the following side-effect may ensue:''

create entities of code that do not stand on their own. Especially in languages like Java where you don't have inner function this is particularly problematic.
create unnecessary call order dependencies in the code. Making it more difficult for the read exercise.
create unnecessary abstractions
slow down development speed. There's a reason quasi-unanimity of the grandmaster code available contains a significant number of LongFunctions. It is the natural and the most efficient way these people write some parts of the code. If it'd be any different they'd be the one to know and they'd be the ones to change their habits.

All in all, following LotsOfShortMethods fundamentalism we'd have a net economic loss in some cases. It may be a net economic gain if EricHodges was the developer in question, but he doesn't write all the software in the world.

Now undoubtedly they'll jump on me with "it's gonna cost you a hell of a lot more in maintenance cost"! Well, that ain't necessarily true just because somebody waves his hands. First of all the YAGNI principle applies. If there's a need to refactor in short methods, then refactor when a long function actually bothers you. Again, we have GrandMasterProgrammers who wrote some code with long functions and maintained it for quite long without feeling the need to refactor. If it doesn't cost them, they shouldn't incur the cost of refactoring just to make their code Eric-level ready, it's not Eric maintaining that code!

Again, we may see the experience of code being written by a grandmaster like Donald Knuth, which has later contributions from other people, and if we are to analyze LiterateProgramming tools and style of source code (no code browsers, not much proper method calls, low level language , no Object Orientation, etc ) then the guys who whine hear about longFunctions would be screaming if they were to maintain it. However other people felt comfortable enough to get acquainted with LiterateProgramming and contribute to that code. It just requires an intellectual discipline, and an upfront investment. So again the difficulty in maintaining varies with the individual.

What should be rejected altogether is the dogma that better programmers have to dumb their coding practices down to the lowest common denominator, and incur the costs associated with the lowest common denominator, for fear that any wiki wannabe will start accusing them of bad practices.

Extracting short methods from long methods isn't "dumbing down" the code. It moves useful abstractions from programmer brains into code where they can be shared. -- EricHodges

I see few, if any, people on this Wiki advocating that GrandMasterProgrammers dumb their coding practices down to the lowest common denominator. I see many people saying that LongFunctions are usually bad, a CodeSmell, and should be refactored if possible.

Is this page supposed to be advice for GrandMasterProgrammers, or advice for ordinary people? If it's the former, I don't see the point, because I doubt a GrandMasterProgrammer needs advice from the likes of us. If it's the latter, I think LotsOfShortMethods is very sound advice - for most programmers, programs become easier to understand when broken up into manageable chunks (EwDjikstra? has advocated this for years), and the exceptions can be dealt with after the programmer has tried and failed to find a way to break them up.

The HeadlinesTechnique *also* breaks code into manageable chunks. It is just a less formal approach for achieving the same goal. If a routine is only called once, I often see no net advantage to making a new formal function.

Yes, and the less formal approach has too many disadvantages to be recommended. The more formal method approach has too many advantages to not be recommended.

So far your arguments all depend on psychology rather than objective universal truths. And, your psychology is very different from mine and yours is not the center of the universe.

And I take issue with the "quasi-unanimity of the grandmaster code available contains a significant number of LongFunctions". The examples on LongFunctionExamples and LongFunctionsInLisp are single functions pulled from large distributions. There is a 70-line TinyClos? function on LongFunctionsInLisp. What the discussion fails to mention is that this is the longest function in the package, that a couple others are in the 30-40 line range, and that the vast majority of functions in the package are under 10 lines. That is not a "significant number of LongFunctions".

That is a "significant number of LongFunctions". If long functions were 'always bad, then we'd see a majority of grandmaster writing a majority of projects where long functions would not exist at all, or their number and importance in the project would be insignificant.

Short methods are easier to grok, easier to maintain, and speed up development. You may not like them, you may feel it dumbs down programming, but that doesn't change reality, shorter methods are easier to grok. It's not dogma to strive to write simple clear concise code, and just because some experts still write long routines, doesn't mean they'd disagree with that. [Some programmers] feel insulted that someone thinks their code could be improved upon. Everyone's code can be improved, even GrandMasters?. You act like there's this enormous expense in writing small methods, but there's not, especially if you write them that way to begin with. You don't have to refactor long methods if you don't write them. As the guy said above, the vast majority of grandmaster code, is usually made of short methods also, I've found this to be true as I'm sure many others have. You pick bad samples to make your case.

[[Sigh. Sometimes things should be refactored. But I think Costin is correct some of the time; taking one 100 line method and refactoring it into 50 two-line methods is usually the wrong thing to do, because it forces the number of chunks that need to be grokked up to 50, whereas the 100 line function perhaps in some cases is a touch too large (or maybe not, depending on what's in it), but it can be digested in a linear sequence of chunk sizes chosen by the reader, rather than forced by the author.]]

People, it is subjective. Different people grok code in different ways. This is purely a psychology issue and everybody has different psychology. Let's drop it and go home or go argue about semi-colons instead to take a break from this.

It is not subjective, it is a fact that you can only remember so much in your head, it is therefore a fact that shorter methods are easier to grok than longer ones. This applies to everyone, even GrandMasterProgrammers, no one is immune, no one is beyond it.

{That is why we have the HeadlinesTechnique. Large things are broken into smaller things. It is NOT a matter of long versus short chunks of code. The issue is whether to use function divisions or some other technique, such as headlines, to divide.}

We don't compare one long function with one short function, we compare one long function with the ensemble of short functions that would replace it -- which may also have objective bad code properties, like call order dependencies, and be overall harder to write the first time and harder to maintain.

Ah, but once we've broken the problem into an ensemble of smaller routines, we no longer need to think of more than one at a time. It's not a RedHerring, it's the point. Now when working at the top level routine, I only need think of the smaller abstractions by name, a far easier task now that all the little bricks are built. It's not harder to do this, it's easier. ... put that hidden knowledge into the code to help others. Programmers should attempt to make their jobs easier by knowing when to subdivide tasks into smaller tasks. Since you can only remember SevenPlusOrMinusTwo, it's only logical to learn to bite off the right size chunk of work. The right size is a far cry shorter than 50 lines, it's SevenPlusOrMinusTwo.

No procedure should need more than a few seconds of analysis to understand how and works and what it does, and a proper name usually prevents one from even having to look at the implementation. One only needs to look at Smalltalk to see this idiom in action. On my last project, 16,969 lines of code, average 6.84 lines of code per method, longest method is 33. It's a joy to maintain because I don't have to spend 20 minutes trying to figure out how a method works when I want to change, upgrade, or extend it, and more importantly, neither does anyone else.

If we accept the axiom that some people can hold 100 lines of code in their head at the same time more easily than others, what do we do when we work on a team made of both kinds of programmers?

I say we should write methods that are easy for everyone on the team to understand. The people who can understand long methods can also understand short methods. Reading 10 short methods to build a mental model shouldn't be a challenge to someone capable of holding 100 lines of code in their head at the same time.

Why would I write code that might confuse my team members, except for a cheap ego stroke by showing off how many lines of code I can hold in my head?

-- EricHodges

I don't know where you get your team members from but where I get mine, they don't come seasoned. Actually they do write longer functions than I typically do and some of those are towards 100 lines. They wouldn't be thrilled to be accused of bad practices either, and they're not grandmaster programmers they are quite regular fellows.

I herewith declare that any programmer who automatically whines when something is more than 10 lines of code is just a lazy bun and has to do some brain exercises. I might give them a pass if they complain for functions longer than 30 LOC, but 10 is just absolutely ridiculous.

New programmers tend to write longer functions because they are unseasoned. In my experience, a less experienced programmer is unaware of the importance of the length of functions (be it short or long). I find that with increasing experience, programmers produce shorter functions. I believe this is because they start realizing the importance of such "formal concerns" as function length. Once they notice how function length affects error count, maintenance effort required, etc., they invest in writing shorter functions. After a little while, the extra effort required to get them into the mindset of writing short functions diminishes and it becomes a natural process. At that point, if the benefit they experience is sufficiently large, they tend to evolve into programmers that look onto long functions with suspicion. -- OlivierAntonis

To respond to Olivier, that is an impression that is definitely valid. More unexperienced developers write less well factored code and consequently longer functions. However, the very experienced developers, even to the level of grandmaster, do not embrace the lotsOfShortMethods fundamentalism 100%. Their code is a mix of maybe a majority of short function (where short can be even 20 LOC, not necessarily 10) with occasional LongFunctions where warranted. The more experienced programmers know that there are a lot more important factors that make or break the code, other than clutching to this abstract notion that 10 lines of code is good whereas 20 lines of code smells, 30 is bad and >50 is an abomination. -- Costin

Hello, that's what we've been saying the whole time Costin, no one practices it 100%, but a majority should be short, with a few long ones here and there. What are you arguing about if you just agreed with what we've been saying? You need to get off the absolutes and realize we are talking about general practice, generally, short methods are preferred. If the average is 10, then there will be plenty of 30 liners just as there are plenty of 1 to 2 liners. We want an average of around 10 or so.. it's not an absolute requirement.

I totally agree with that, the more experience one get's, the shorter methods tend to become. Everyone starts off with long methods, we have to begin somewhere.

I assumed Costin wasn't making that argument. If he was then it should be our duty to teach new programmers to write short methods. -- EH

Why don't you go on qmail developer mailing list and try to teach DanBernstein that he should be writing shorter functions? See how that goes and come back and tell us the experience. -- Costin

This page seems to have LotsOfShortParagraphs?. Wonder why? When is one moved to make a point in one big long paragraph, and when is one moved to break it up into smaller ones? Might we go looking for LongParagraphExamples? in classic literature, and attempt to draw conclusions from such a sample?

Again, the HeadlinesTechnique *does* have "short paragraphs". Thus, your analogy fails.

It's interesting to note that GrandMasterProgrammers refers specifically to programmers who are significantly more productive than others, rather than to programmers who write code that has positive design aspects. It could be argued that these two things go together, but I would want some proof before accepting that claim.

I don't think Costin is using the label to mean programmers who are significantly more productive. He seems to be talking about famous and published programmers. My experience is that the most productive programmers eschew long functions. -- EH

Well, short functions personally don't make me more productive. All the eye-hopping and larger function name-space slows me down. Please stop dictating to ME what makes ME more productive. I know my eyes, hands, fingers, mind, and code mistake and typo patterns better than you ever can. Plus, companies tend to hire like-minded people, so observations about "most developers" by you, me, or anyone else may be biased. -- AnonymousDonor

Notice that I said "in my experience". I'm not trying to dictate what makes you more productive, just sharing my experience. -- EH

Well, our experience differs. Let's AgreeToDisagree.

That just means you're still lacking that experience that drives one towards shorter methods, work hard, it'll come! As condescending as that sounds, it's likely true.

I am not a newbie. I came into the market in the VAX days. Maybe just before I am lowered into the casket I will finally "get" shorter functions, eh?

I understand people have different judgments, lot's of people on American Idol seem to think they can sing too, that doesn't make their judgments correct. The effect of function length on a program's maintainability is well known, and is not subjective.

I would like to read such studies.

It is bad practice, plain and simple. Just because there are successful programs that are full of long functions doesn't change that. Many things succeed in spite of themselves. The vast majority of experts recognize this, even preach about it in the books they write. While not every long method is bad, Short methods are preferred in the majority of cases.

Well, if X is not subjective, then there must be objective evidence besides ArgumentFromAuthority. Thus, where is the objective evidence? Regarding your use of the word "stubborn", someone who mistakes personal preferences for objectivity is more stubborn than someone who mistakes objectivity for subjectivity. Stubborn people tend to do the first.

If that's what you believe then fine, but the profession will march on and continue to label this an anti pattern as we move closer and closer to being true engineers. Like it or not, not everything comes down to opinion, there is a right way and a wrong way to do many things, this is the wrong way.

[[A lot of this discussion has been implicitly or explicitly about the magic number SevenPlusOrMinusTwo (which BTW has been demonstrated to be vastly oversimplified, not an absolute truth; see the modern literature). However the proponents of this view keep rejecting the notion that it should be applied not just to individual methods, but to any resulting ensemble of methods after refactoring. The only justification I've seen for that rejection is that claim that you shouldn't ever need to look at or understand the ensemble. I think that's lame. If I need to understand a 100 line method, that need doesn't go away once you've turned it into 50 two-line methods. But now there are 50 chunks, violating 7 +/- 2, even though each one individually is trivial.]]

It doesn't work like that. If you can break a method down into 50 steps, then you can group those step under other methods. At the highest level you don't see 50 pieces, you see 6 or 7. There is never a need to understand 50 methods all at once. If you can demonstrate otherwise, I'd love to see it.

One still has to understand the 50 pieces on their own, but all of them nevertheless, and the contracts between them, and the call order and other inter-dependencies that were created in the process, in order to convince yourself of the correctness of that code.

One has to understand the contracts, order and dependencies regardless of whether the code is organized in one long method or many short methods. This has nothing to do with XP. -- EH

One also doesn't need to understand all the implementation of all of them when they are encapsulated under named abstractions, which is the entire point of using methods in the first place. If there are call order dependencies between three methods, write a fourth that calls them in the correct order and just use the fourth. Keep building abstractions until the problem can be solved simply without requiring lots of long methods.

Having a bunch of embedded functions isn't any clearer than headlines.

Those methods have scope, can be given names, and can be passed around for reuse, allow for the creation of closures, it's a huge difference than just having inline code, I can't believe you'd even say it's similar.

You don't need to have a name to create closures.

Regarding "Long is a Typical Newbie Mistake"

I disagree with that. Some of my college listings I rediscovered had too many short functions. I now generally let OnceAndOnlyOnce be the guide now, and I think it produces better code, or at least not worse. Sure some programmers write horrid long code, but that is because they are bad/unskilled programmers, not because long is automatically bad. If you forced them to shorten them it would probably still be a mess, just a different mess. -- AnonymousDonor

By breaking a function into lots of small function you create more contracts and more dependencies. You create other positive effects as well, but they are conflicting forces that need to be balanced. You never aknowledged that there are competing forces at play. Claiming that this balance is around 10 lines of code for everybody everywhere, is just hot air.

One doesn't need to understand the implementation only if one can delegate the implementation to somebody else who will understand it.

How does breaking a long function into smaller function create more contracts and dependencies? I think the same contracts and dependencies exist in the long function, but instead of being between methods they are between lines. They may be made more explicit by smaller functions, but they aren't created. -- EH

To quote Costins favorite guy... EwDijkstra

"Separate Concerns" (SeparationOfConcerns)
"The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague."

Small methods allow us to accomplish this!

But not the only way. -- Anonymous And certainly not automatically!!! Breaking a 30 line function into 3 10 lines function accomplishes you nothing of the above automatically. There's plenty of OO code out there in the wild with LotsOfShortMethods that breaks those things. Especially given the limited size of the skull, one should be aware he can't keen an inordinate number of abstractions, contracts and dependencies in his head. -- Costin

Which is why you write them down and call them methods, so they aren't just in your head. Big functions keep all that in the authors head, small functions share it with the other programmers.

So you want me to keep more method names in my head, more call order dependencies in my head, more contracts in my head. And do you want me to embody the contract in the method name, as well?

I can't speak for all long-is-okay proponents, but I use the HeadlinesTechnique, and thus don't have to keep sections "in my head". They are clearly visually marked.
- TreatCommentsWithSuspicion. If some piece of code is worthy of a comment, it's usually worthy of a method. Comments lie because programmers tend to forget to update them.

More frequently than they forget to update method names when methods change ?

We have been over this already. My observation was that comment abuse level was about the same as function name abuse level, and you felt they were different. We have different experience. Let's AgreeToDisagree.

Executable comments(methods) are preferred over non-executable comments, which often don't get updated since they have no impact on behavior. Non executable comments should be used to express things that the language itself is incapable of, such as why you did something a particular way. Anything that can be expressed directly in the language itself, should be. Headlines can be directly expressed via method names, and should be.
So you think the name of the method has an impact over its behaviour? There's a reason why DonaldKnuth puts so many comments. First of all, he is a gifted writer. Second, comments are expressed into a language unconstrained and therefore much more precise than the actual programming language. There are good comments and there are bad comments. If you don't know how to write substantial comments, you can always learn from DonaldKnuth. But the way to actually strive for the mastery of programming that DonaldKnuth has is to follow in his footsteps, not come out with your own prejudices and think that you know better. You are at liberty to do the later only when and if you have proven yourself to program better than him.

As far as I can tell, DonaldKnuth is a brilliant and famous procedural programmer, I can't knock him, but I'm not a procedural programmer. Kents examples aren't toys, and I use his techniques on a daily basis, he's an OO programmer I respect, and his style is far more applicable to my work than Knuths.

This topic is mostly about procedural programming. There are other "short method" topics. Perhaps what works for OO does not necessarily apply to procedural.

In languages like Java where functions cannot be embedded into larger functions, the dependencies are multiplied by the fact that they do not share the context automatically, but the proper context has to be passed into them by the caller, so here they are more contracts. While a block of code just sits where it is, and is evaluated in only one way with regards to the surrounding code, a function has the potential to be called from many places. For example a later maintainer can see a new private function and get the idea wouldn't it be wonderful if I called it from here?. A function name does not spell I can only called from that particular place, and only after the three other functions are called. And it also doesn't say in its name, "future maintainer don't make me virtual because I need to be called from inside a constructor".

Some of these forces are balanced by other forces , but breaking the code into small pieces does not always come for free.

Nothing comes for free, and I do agree that without lexical scope, things can get hairy, but that's what objects give you in Java, lexical scope to the objects variables. Would it be cleaner in a language that supported anonymous inner methods, certainly. As far as I'm concerned, Java's a brain dead language, you won't get any argument from me there. Languages like Java force you to work around their weaknesses and don't always allow pretty code, or short code, we don't disagree there. But things like call order dependencies can be guarded with assert's on preconditions, which is a good practice anyway. "future maintainer don't make me virtual" would make a great comment, as it says something that the code can't, but it's not a reason to avoid writing the method in the first place. Yes, it takes much more than small methods to make good code, but then no one ever said otherwise, we just said small methods are something to strive for... "among other things" was taken for granted.

So things like call order dependencies are not desirable but we can mitigate them by writing guards that were not needed to begin with (even more entities put in the code). However long functions cannot be mitigated by anything else (if nothing else just economical reasons are good enough, but there's more). Do you at least agree that call order dependencies are bad?

No, I don't. I can't call aConnection.Fill(...), before I call aConnection.Open(...), call order dependencies are a natural part of any contract, you can't avoid them. Truth is, it's the caller's responsibility to ensure preconditions are properly met, you shouldn't have to check for them, so you assert them, if you are trying to catch bugs, but you don't have too, nor should you.

Whoah, there. There ways to automate call order so that such dependencies cannot be done incorrectly, or alternately there are other ways to automatically (not manually) check and throw if they were violated. They are not just a fact of life that you have to live with.

I'm listening...

Whoah. So call order dependencies are not bad ... interesting things one learns every day. Now that we went into SemanticOverprecision? mode, I should ask you: do you agree that a code structure with more call order dependencies between parts is worse than a code structure with less call order dependencies, all other factors being equal?

All other things being equal... including method length... yes, I agree, as long as they aren't intentional dependencies, i.e. part of the api. And 50 LOC is a smell, not the end of the world.

And to add to that, does it somehow happen that "intentional" call order dependencies are typically a interface design smell, and most of the times can be avoided? For example, if you need to call Connection.Open(...) before you can call Connectionm.Fill(...), the provider of the connection could have had: ConnectionProvider?.Open(...) -> Connection, so that you couldn't perform operations before the actual connection is validly open?
The bottom line is that 5 call order dependencies are at least as smelly as 50 lines of code (all other things being equal).

Possibly.

Almost every argument for or against long functions here appears to depend on psychology. Are there any that don't? There is a slight difference in lines-of-code, but I don't think anybody expects such to carry their entire argument.

I don't see comments on this page concerning the most basic and important measures of code quality: CouplingAndCohesion.

If a long function is broken up into a number of small methods, this could be either good or bad in terms of coupling and cohesion, depending on what the resulting pieces are like.

If the refactored pieces have very little communication between them, so that coupling is low, then that measure is good. I think this has been assumed by advocates.

But of course, if each method only does one thing, it's unlikely they are coupled to the other methods. No matter the problem to be solved, it's easier when you separate the algorithm from the implementation of each of its steps. That's what I said, the advocates assume this (and I'm not arguing it, here). It's a big "if", which is why it's an assumption, but we can grant it for the moment.

On the other hand, if the refactored pieces have less cohesion, because highly cohesive elements of the original function were split apart, then this could be bad. It would depend on what the elements were.

CouplingAndCohesion analyses usually just degenerate into PerceptionOfChange HolyWars.

No, they don't. Point to at least one and preferably several examples to justify that "usually". I don't think that CouplingAndCohesion gets discussed much; "usually" there's just a passing reference to the notions and the conversation moves on.

And even if the conversation did usually degenerate, here's your chance to avoid being a degenerate by addressing the cohesion issue head-on. :-)

A fuzzily defined and subjective measure cannot be pumped up to the lovel of "most basic and (most) important", to begin with. I prefer to talk about InformationHiding. Furthermore these are measures that talk about modules. Functions are typically not modules for all intent and purpose. It is much less relevant to measure coupling and cohesion for units of code smaller than modules.

I don't follow this. Which alternative is less "fuzzily defined and subjective"? CouplingAndCohesion have been reduced to actual quantitative metrics, whereas other things have not...your preferred "InformationHiding", for instance, I believe has never had a scalar metric associated with it. Granted, there aren't any software metrics that aren't controversial, but it can be better to have one than not to have one.

"Cohesion is the degree to which the responsibilities of a single component form a meaningful unit." Well, that's very fuzzy in my book. And given this fuzziness, it's shocking to be claimed as part of the "most basic and most important".

As for saying that functions aren't modules, so coupling and cohesion just don't apply, I am actually shocked to hear you say something so clearly incorrect; your track record is such that I didn't expect that. The concepts arose out of Structured Programming examination of nothing *but* functions; go refresh your memory.

Yep. I just have SoftwareFundamentals on my desk, so I checked again, just as a favor. Functions are by default not modules, certainly not the kind of functions we are talking about in here (5-50 max. 100 lines of code), and not for the useful definition of module, that is to say DavidParnas' definition. That there are exceptional functions that may be considered a module does not invalidate the argument.

Or are you just saying that you have a private notion about CohesionAndCoupling that you think applies better than everyone else's notion, which does apply to functions? I suppose that could be interesting to see...over on the relevant page.

I'm saying that coupling and cohesion is a largely subjective measurement, especially the cohesion part. Coupling represents the upside-down of InformationHiding and therefore is kind of unnecessary, if something abides by InformationHiding principle there's no need to investigate the coupling, it'll be down to minimum necessary, and speaking of which coupling has been touched on this page as call order dependencies (and other dependencies that can be introduced while splitting a large function).

But the principle of InformationHiding does not usefully apply wherever, there's the famous point of diminishing returns. For example is almost useless to apply to units as small as blocks of code (for those we have different and objective measurements). DavidParnas said that it should be applied to modules. And he also defined modules as what constitutes an work assignment for a programmer on the project. Certainly on my projects, an work assignment is more than a function, it's a handful of them. And furthermore InformationHiding applies to the contract (interface) exposed by a module not necessarily to its implementation. Still shocked ?

Hmm. Well...I suppose this does ring a bell, where me being shocked has often been a result of me being wrong. :-) So perhaps I'm all wet here. Assume so while I think about it.

Some long functions are bad.

Some long functions are good.

Some short functions are bad.

Some short functions are good.

Feh.

Is it worth trying to identify particular kinds of long and short functions that are bad? For instance, I would personally say that a GrandCentralStation is fairly obviously bad. As long as we're arguing about abstract guidelines, it'll be more heat than light.

Someone once wrote on this page: "(Note that Mozilla displays too many blank lines in this example.)". Did you report it to BugZilla, or do I need to use the RemoteStrangulationProtocol ?

{I have not identified yet whether the fault is with Mozilla or Wiki or a combo. Anyhow, I would appreciate if you put the warning back because visual spacing is a key element of this debate.}

Someone once wrote on this page: Maybe somebody here needs to inform DonaldKnuth of the discovery of the new fact finding mission on MARS. The future of software engineering on the red planet!

How is that relevant ?

The Issue Is Complexity Not Function Length

I think the problem is that "Function Length" is sometimes used as a surrogate for "Software Complexity." I tend to find that reducing complexity often results in shorter functions, but shortening functions does not necessarily result in reduced complexity. There are times where I combine (lengthen) functions to reduce complexity.

Unfortunately, complexity is a pretty subjective term. It is easy to count lines in source code and define a cut-off value for "too long" and quite easy to pick some code, even at random, and separate it into multiple functions each within the desired size limit. It is far more difficult to look at a function or set of functions and objectively measure the complexity, and once one has determined that the code is too complex, it can be quite challenging to reduce the level of complexity, though it is usually worth the effort.

Focus on code clarity and reducing complexity. Do not be concerned with the length of the functions, the length will be what it needs to be to optimize clarity.

--WayneMack

I agree, however, I find that Long Functions are the damp logs under which the newt of complexity often lives. So, disliking a function because it's long is unwise; suspecting complexity lurks in a long function seems to be useful.

Blocks is Blocks

I view code as a bunch of blocks. Some blocks are better off named (like functions/methods) and some don't need to be named to do their job (IF/CASE, etc.) I see no reason to put names on every block, but only those who are better for it.

To me, the most disturbing thing about this page is the celebrity fetishism. Who cares what famous programmers do? EricHodges is right--long functions/methods are a code smell, and should be dealt with using Extract Function/Method refactorings. I recommend MartinFowler's RefactoringBook, and KentBeck's writings, for inexperienced or ancient programmers who need to learn these modern software practices.

Oh, the fans of MartinFowler et. comp., complaining of celebrity fethisism. Isn't that ironic ?

But anyways, as long as one lives one learns. Maybe people chose the wrong models to follow. So where can we study a GrandMasterProgrammer chef d'oeuvre from the said "modern software practices" luminaries ?

Try reading MartinFowler's RefactoringBook. The value of the book does not follow in any way from MartinFowler's fame, such as it is (I'd never heard of him when I read the book in early 2000). It follows from the content of the book. That's where reading is required.

If there were a headline tomorrow--MartinFowler Reverses Position, Supporting 100-Line Functions--long functions would still be a CodeSmell, celebrities notwithstanding. Those famous people who think otherwise are wrong about this particular point.

Books are cheap -- especially some type of books with lots of verbiage, little to no mathematics and trivial coding samples. Code is harder. I wanted you to refer to some preferably open-source projects on the magnitude and success of those from where LongFunctionExamples are drawn.

If your salary allows you to consider the Fowler book cheap, then I recommend you purchase it. And read it, of course. It will help you grow as a programmer. And you're right--the book has little to no mathematics. But there's a lot of great stuff about computer programming.

Let's get back to the subject of code, shall we ? And by the way, since you are so boastful of your book recommendation, I'd like you to tell me how recently -- if at all -- have you read any of DisciplineOfProgramming, TheScienceOfProgramming, SiCp. Maybe ConceptsTechniquesAndModelsOfComputerProgramming. Not exactly night time reading like the refactoring book, but given the eagerness with which you push it, I want to know at what level we'll have this discussion. Or if it's worth having a discussion at all.

Also, "following models" is part of learning a handicraft or skill--like programming, cooking, turntablism, tennis, chess, etc. But *blindly* following models is *not* how you learn these things. That's worshipping experts, not learning from them.

Hey, do you want to have a discussion or to lecture ? I guess we've had enough of this BS. Go sell that book to your junior programmer colleagues and come back after you'll have learnt to articulate a point of view, and you'll have fixed that crystal ball of yours that is supposed to tell you who have or have not read what book.

I also disagree with the implication that the purpose of programming is to demonstrate your intelligence. That is the attitude that leads to so much convoluted legacy code. The purpose of programming is to create or modify working software.

CategoryDiscussion