Long Function Heresy

The LongFunctionHeresy is my own considered opinion about how long a function should be.

A function should be as long as it takes to contain the code that expresses its purpose. In particular:

No function should be split just to make it shorter. What I find is that when reading such code, having to jump into a function to find out what it does (and I can never trust its name or documentation, particularly when I'm bughunting) is an extra effort. Drilling through a train of a dozen of these, for every line of code in the top function (and it often come to that) is <some bad thing I can't think of right now>.

No function should be written that is called in exactly one location in the code. My experience is that even if a block of code looks like a well defined action, and would make good sense to be a function, I don't know exactly where to make the cuts, or what are the parameters and what are the constants, until I have a second caller.

No function should be written that doesn't do a whole task (or several whole tasks, barring other refactoring rules).

An example (in C++) from recent experience:

Heightfield::BuildContours(activeArea, minimumHeight, maximumDelta)

//// the Heightfield class contains an array of height information, our task is to generate contour data for pathing around the cliffs and holes

// 8 lines of determining the interesting part of the field, often much less than the total field

// 25 lines iterating through the interesting part of the field, marking each node for three bits of interest

// 20 lines iterating again through the field, combining the interesting bits into actual edges

// 400 lines iterating through the field once again, combining the edges into useful groups

//// the bulk of the 400 lines is a switch on the six directions of edges, each 50 lines long

Each of the four iterations could easily be made a function, but why? None of them do anything useful by themselves. Ditto the six cases of the giant switch statement. Actually, I tried and failed to make the six cases be the same code, but they are unfortunately similar, but not similar enough.

That is my heresy. Shall I be shunned?

CraigEwert

No shunning.

Your rules sound decent to me. I've got a few little quibbles with them, but by and large they sound reasonable; my own personal rules about function length don't really contradict them in any significant way.

I find, though, that when I follow my rules, I never ever end up with 400-line functions. I wonder why.

(But then, I'm not working in C++.)

-- AdamSpitz

One word: testability. Consider:

 HeightfieldTest? {
     testDetermineInterestingArea() {
        assertEquals(aHeightField->determineActiveArea(), expectedInterestingArea);
     }

     testMarkNodes() {
         assertEquals(aHeightField->markNodes(), expectedNodes);
     }

     testCombineEdges() {
         assertEquals(aHeightField->combineEdges(), expectedEdges);
     }

     testGroupEdges() {
         for (direction = 0; direction < 6; direction++) {
             assertEquals(aHeightField->groupEdges(direction), expectedGroups[direction]);
         }
     }
 }

Now you are much more likely to be able to ensure that your code is correct, narrow down the areas where a bug would be hiding if your results are wrong, and you in a much better place to make changes later and still be able to ensure the functioning of the component. See MartinFowler's book RefactoringImprovingTheDesignOfExistingCode for more examples. -- StevenNewton

Interesting. I got hooked on XP-style testing a few years ago. When I write code, I'm not explicitly thinking, "I should make this function short so that it'll be testable." Rather, I just write the tests, and the functions somehow turn out short. Neato. :) -- AdamSpitz

Excellent! If you CodeUnitTestFirst you're golden, but if you have existing code that has no tests, it helps to find the hidden methods. One good rule of thumb (pattern?) is to look for any comments in a long method. If you see a comment like "// find the set of interesting nodes", there's a very good chance that creating a method findInterestingNodes and moving the commented code there is a big win. -- StevenNewton

Another word: maintainability.

My heart sinks when I see a 700-line function. Invariably over time you'll get local variables being reused in different contexts within the function body, making the whole thing difficult to understand.

By cutting stuff into smaller functions, you get guaranteed scope limitations that otherwise have to be manually checked by reading (and understanding) the entire larger function.

In other words, perhaps you can divide the large function up into smaller functions which themselves have their own atomic purpose. -- BrentNewhall

Another word: GetaBetterEditor?

Browsing into functions should be as easy as a key press. If they are named well you often don't have to browse into them at all, or only have to browse into the ones that you're curious about at a particular time. Long functions have a lot of short functions inside them trying to get out. -- EricHodges

I'll disagree with the idea of avoiding creating a function that is only called once, because as I see it, that's a bet that it will never be called in the future. If there's a good reason to split out code into a function, even if it's only called once, the new function may become useful later.

This has happened to me, particularly with text formatting code; I've moved code out into a function, only to realize how useful it is and genericized it so it can be used in multiple places. -- BrentNewhall

Even if the new function never becomes useful later, a private function with a MeaningfulName produces SelfDocumentingCode. If I feel I have to include a comment to explain the purpose of a block of code, I ask myself if I could do so with a function name instead.

If I see a "natural" split in a large function, I will probably divide it even if called only once. However, if I don't see one, I may just leave it long rather than divide it in order to follow some limit rule.

It would be nice if more languages supported nested functions so that we don't have to pass huge parameter lists. The child function would automatically "inherit" the parent's scope. I have only seen 2 languages that supported this, but it can be a great time saver and parameter list simplifier.

Acceptable function length also depends on what the function does. I find that when I write a function that does some kind of analytical work (e.g. image processing functions, statistical analysis) I often end up with larger functions than when I write GUI, database access or other support code.

I consider any kind of rule that placed an upper limit on function length evil. -- AndrewQueisser

Those of you who are arguing for longish functions ought to take a look at Fowler's Refactoring. It'll change your viewpoint on a lot of these matters, and it will provide you with some techniques facilitating shorter functions. It's a fantastic book.

Yes, it is. But there are cases where I've refactored into longer functions to improve readability and maintainability.

Secondly, I'm seeing arguments stating that functional elements that "naturally" belong together should be in the same function. This "naturally" word almost always implies a judgement of some kind. I'm reminded of the classic How many kinds of people are there in the world? jokes. The point is that it depends on the criteria you use, and in the case of making functions smaller, it pays to use whatever criteria result in more groups of fewer elements.

That certainly can be your judgement. But separating code just to separate it is not the simplest thing and smacks of fundamentalism.

I'm concerned by the author's statement: (and I can never trust its name or documentation, particularly when I'm bughunting). Of course function names and documentation are sometimes wrong (usually an excellent indicator of the presence of a bug), but they should on the whole be far more helpful than unhelpful.

Based on your scenario described above, I'm imagining coming across the following function:

 BuildContours?(activeArea, minimumHeight, maxiumuDelta) {
     usefulPart = determineInterestingPart();
     markNodes(usefulPart);
     edges = determineEdges(usefulPart);
     edgeGroups = new_empty_list();
     for edge in edges {
         switch( edge.getDirection() ) {
             case(0): edgeGroups.add( combineEdgesInDirection0(edges) );
             case(1): edgeGroups.add( combineEdgesInDirection2(edges) );
             case(2): edgeGroups.add( combineEdgesInDirection3(edges) );
             case(3): edgeGroups.add( combineEdgesInDirection4(edges) );
             case(4): edgeGroups.add( combineEdgesInDirection5(edges) );
             case(5): edgeGroups.add( combineEdgesInDirection6(edges) );
         }
     }
     return edgeGroups;
 }

Now if my bug is that some spurious edges are being created, I can tell I'd better check out markNodes() and determineEdges() first -- they're FAR more likely to be the source of the problem. If I were confronted with a 400 line function, I wouldn't have the faintest idea where to look. And this is far more readable. As someone else said, if scrolling around in the file is such a pain that you don't want code as readable as the above, you should GetaBetterEditor?. -- MichaelChermside

Ick, a switch statement. Give Direction some smarts:

 BuildContours?(activeArea, minimumHeight, maxiumuDelta) {
     usefulPart = determineInterestingPart();
     markNodes(usefulPart);
     edges = determineEdges(usefulPart);
     edgeGroups = new_empty_list();
     for edge in edges {
         edgeGroups.add( edge.getDirection().combine(edges) );
     }
     return edgeGroups;
 }

-- EricHodges

I like that, let's see what we can change to preserve the LawOfDemeter:

     for edge in edges {
         edgeGroups.add(combine(edge.getDirection(), edges));
     }

... later ...

     combine(direction, edges) {
         direction.combine(edges)
     }

-- StevenNewton

Moved from LawOfDemeter

In truth, I'm not sure I'd make combine a method, depending on what it does. From the original example on LongFunctionHeresy, it seemed to be rather complex, taking some 400 lines of code in a 6-case switch statement to completely implement. That would tend to make me ask the code "Is there a class that does edge combining, and what would it look like? Then I'd start playing around with a new class, EdgeCombiner, and give it appropriate behavior. Roughly:

class EdgeCombiner {
EdgeCombiner(edgeGroups) { }

combine(edge) {
// whatever combine does with edges and an edge
// varying on edge.direction();
}

edgesFor(direction) {
// return the edgeGroup for the given direction
}
}

Now I have a useful object that can behave appropriately given the direction and edges passed in the constructor. I'm not familiar enough with the original problem domain to know if this would make sense, but I can play with it a while and if it turns out that EdgeCombiner doesn't work as a class, I can delete it and fold the behavior back into the original class. I'm not intending to reintroduce the switch, it would depend on the internals of combine(), and I'm out of my depth in the problem domain. Not knowing the domain, I don't understand why or how a Direction would know how to combine Edges, but if it makes sense in the context of this problem, then go for it.

-- StevenNewton

One problem with that approach may be that it reintroduces the switch statement. From the snippet on LongFunctionHeresy it seems that each direction has a unique combination algorithm. So EdgeCombiner.combine() is going to have to sort out that mess all over again. If the combining algorithm varies by direction, why not put it on a Direction class? Then an Edge knows its Direction and a Direction knows how to combine Edges. I may be way off base because I have only a passing acquaintance with this problem domain, but I know I prefer polymorphism to switch blocks. Also, making Direction a class gets away from the arbitrary numbering scheme in the first example. -- EricHodges

One thing that sometimes encourages me to have long functions is that there may be a lot of working variables that I don't want to pass around as a bunch of parameters. If languages allow sub-functions, like Pascal, then this may be less of an issue. Sadly, this feature is rare.

Some people suggest putting them all into an associative array or object to make parameter passing easier, but then referencing them requires more syntax. I don't like long-winded code. Plus, you may have to convert a lot of variables into array/object entries in preparation. It would just be easier to have sub-functions.

I'd argue that the first thing to do when you have a long function with many working variables is to try and simplify the algorithm so you have fewer working variables (and hopefully a shorter function). Our brains can only hold so much state at once (the commonly accepted figure is 7 items, +- 2). If you have 400 lines and 30 local variables, you ain't going to be able to hold the whole thing in (human) short term memory anyway. Why not make things easy on yourself and make the mental divisions explicit?

There may be some cases where this is not possible (though I've never run across any in a problem I've worked on - usually once I've actually thought about what I'm doing, I can get most methods under 10 lines, with the biggest being about a screenful and a half). But I'd at least try splitting the algorithm up - usually, not all parts require all the state, and the less I have to reason about, the better. I'd not worry too much about syntax bloat, not if the tradeoff is semantic bloat and holding more context in my head. For me, the bottleneck isn't reading or writing code, it's understanding code, and anything that makes that easier makes me go a lot faster. And until I can read code as fast as I read [/understand] the latest HarryPotter novel, I suspect that'll remain the case.

-- JonathanTang

Re: For me, the bottleneck isn't reading or writing code, it's understanding code

These are intertwined in my opinion. I have actually taken copies of long-winded code, and deleted non-essential items to make it more concise to read. It was not meant to be runnable, of course. HowImportantIsLeanCode

Long functions should be modularized as time goes on. Using dual monitors to try and view a function as one piece is a sign the function is way too long, for example. But it all depends on the situation. Case statements can especially be very long. Sometimes humans write longer functions at the beginning (for immediate readability and 'speed of thought coming out' ). One then goes on to chop them down if one needs to re-use parts of the function several times, or if one can name the function easily and describe what it does.

If one never has to touch the block of code again or test it, it could be better to write the function once and long. However, it seems throw away programs and code snippets end up being reused more often than we think.

Of course, another way of looking at it is this: if the Wiki was one long drawn out web page, would that be as useful as if it were re-factored into several sections or even pages? Of course. It does depend though. Sometimes it's useful to have it all in one spot, sometimes it's better to section it. There is no right or wrong answer, but definitely different options which work better in different situations.

Many levels of nesting make a function difficult to read, where nesting if when you have if, for, while, and similar constructs inside each other. I try to avoid having more than about three levels of nesting.

Overlaps with LongFunctions