Internal Loop Exits Are Ok

This page exists to summarize http://www.cis.temple.edu/~ingargio/cis71/software/roberts/documents/loopexit.txt.

As everyone knows, GoTo is the bane of StructuredProgramming, and this fact was first laid down in a paper named GotoConsideredHarmful. Since then the entire topic of what constructs are and are not properly usable has been a subject of ReligiousDebate?. Some people say that you should have one entrance and one exit to every loop. Some say that you should have one entrance but many exits are OK. Some say that you should AvoidExceptionsWheneverPossible. Others claim that you should UseExceptionsInsteadOfErrorValues. (I distrust people who tell me what I should do, but let us leave that.)

How can rational people come to a reasonable opinion on topics like this?

A good start is to read the original paper at http://www.acm.org/classics/oct95/ and understand it and then compare your understanding to other people's. My understanding is that Dijkstra's point of view starts with ProgramsRepresentMentalModels. Which immediately implies that you want a good match between how people can understand what is really going on, and how it is represented. Therefore the question of what programming practices are acceptable is fundamentally a question about human psychology and cognition - what do people find natural? He then argues that constructs that people wind up writing with GoTo do not map well onto how we think. By contrast what we write with a variety of structured constructs tends to be comprehensible to us, therefore by using those we make it easier to verify that the program represents the mental model we want and make it easier to figure out what mental model it represents.

Let us apply this principle to the subject of internal loop exits. The paper I referred to above asks the reasonable question, Are we better off with or without internal loop exits? It does not try to argue either way based on first principles - indeed arguing on abstract principles misses the point that this is a question about human cognition. Rather it answers it by asking three questions:

Do people find it easier to produce correct solutions to common problems with or without internal loop exits? Yes! Even if it means using a feature they have no experience with!
In which case do others find those solutions easier to verify? The one with internal loop exits.
When they have it available, do students tend to abuse the feature in practice? It appears not.

The specific type of common problem they researched is TheLoopAndaHalfProblem?, which is the case where you have a loop in which some processing must precede the test. For instance you may have a sequential where you need to read data before you can proceed in test. Here is pseudo-code for two different natural strategies:

loop
read in a value;
if value = Sentinel then exit;
process the value
endloop;

read in a value;
while value != Sentinel do begin
process the value;
read in a value
end;

The former, of course requires an internal loop exit. The latter requires repetition of code. Many dislike internal loop exits. Any decent programmer dislikes repeating code. Which is the lesser of 2 evils?

Their answers were as I indicated above. Internal loop exits win in practice for real people. Lines of evidence for their conclusion include their teaching experience, literature on controlled tests of student performance with and without the feature, length of correct code in Pascal with and without an addition allowing this strategy, and the teaching strategies that textbook authors have had to resort to. Feel free to read the paper for details.

The evidence presented was more than enough to convince me that InternalLoopExitsAreOk.

-- BenTilly

Remember that the above loop could be written as:

while ((read in a value) != Sentinel) {
process the value;
}

Or if you don't like that you could try an ExternalIterator or InternalIterator that skips values equal to Sentinel. Really any time you start to think the your case is exceptional, you're probably missing an object.

-- DavidSalamon

I would write that using the exit, because I am strongly from Python background (and because it would just look weird with indentations, we don''t get do...while loops). However, if I *HAD* to not use them:

    cond = 0;
    until (cond) {
        value = ReadInAValue();
        if (value != Sentinel)
        process(value);
    else cond = 1;}

...or something like that. Anyway, it would be VERY EVIL, and I would shudder at it. If you think that internal loop exits should not be used, then please explain why they shouldn't be. Don't just say that you can do it another way. Yes, I know several other ways. Turing guarantees that you can always do it another way. Explain why it is BETTER to do it another way. "Right" and "wrong" here are not axioms, they are based on what people have found works and what doesn't.

Alternately tell me why you think that my reasons for saying that this is a good way to do it are mistaken.

As for saying that all problems should be solved with another object, I have seen this philosophy lead to disasters. Yes, adding an abstraction layer can solve any problem (other than the one of having too many abstraction layers). And adding them is fun. But there is a cost to introducing yet another object, yet another class, and yet another layer. Remember that YouArentGonnaNeedIt. If there is a natural direct solution, take that first to avoid PrematureGeneralization.

-- BenTilly

Are you suggesting that using an explicit loop-and-test (internal exit or not) is intrinsically more natural and direct that a solution using an internal or external iterator? If so, why? It seems as if this "religious debate" is intimately tied up with only being able to use 70's era imperative, procedural programming constructs. Personally, I find writing loops of any kind irksome, I find it much more natural to think (and therefore program) using internal iterators and other structures.

It is my opinion that while(true) is extremely unexpressive. The standard format is to have the conditional logic at the start of the loop, and just as I wouldn't put a window's close button on the middle left side of the window, I'm not about to put the important logic in a hard to find place.

Also note that in the case of InternalIterators you don't introduce another object, just another method. (Standard libraries will usually have this implemented though, as in SmalltalkLanguage and RubyLanguage)

-- DavidSalamon

It seems to me that the only problem with the second loop is that it requires a priming read. I'm not sure that really qualifies as "duplicate code" but I suppose it could. I prefer the second option because it loops no matter what. The first option forces me to think about whether or not the end condition is met.

If the read takes several lines, it does qualify as duplicate code. You can reduce the duplication with a function call.

As for preferences, read the paper. The difference in the ability of students to come up with a working answer when they are and are not allowed to use internal loop exits is extreme. (100% with versus under 20% without, sample size a few dozen people in each group.) Given my druthers I prefer techniques that real people reliably find correct answers with.

-- BenTilly

Note that the techniques that are easy for beginners to learn are not always the same techniques that experienced people find easy to use. Also, code written by beginners is not necessarily the most maintainable code. I agree that we should prefer techniques that "real people" can use reliably, but I think the "real people" we should worry about are professional programmers, not students. (FWIW, I use internal loop exits, and don't think there is anything wrong with them. I just don't think that this experiment with students is very enlightening.) -- KrisJohnson

Hey, this is a LanguageSmell. In a functional language, where loops are TailRecursion and if expressions return values, the solution is clean and free of dogma. E.g. in SchemeLanguage:

  (let loop ((var init))
(if (equal? var sentinel)
'done
(loop (next-value))))

The hardest dogma to see is always our own. Translate loops with recursion or iterators and you have transformed the problem, not removed it. (Note that TailRecursion only is special in that in some languages it is automatically optimized.) If you tried to complete your code you would realize that within next-value you run into an end of data condition. The question then raises itself as AreExceptionsOK? Incidentally, I think that exceptions are less OK than internal loops because even though the control implications are roughly equivalent, there is more ActionAtaDistance? involved. -- BenTilly

Within next-value you run into an end of data condition. Why so? The following seems to work pretty well (in MzScheme), without an exception in sight.

 (define (next-value)
(random 10))
 (define sentinel 5)
 (define (first-value)
(random sentinel))


 (let loop ((var (first-value)))
(display var)
(if (equal? var sentinel)
'done
(loop (next-value))))

A few sample runs:

174828666271378210776122240036765 done

16715 done

Hah, hah. Yes, that iterator will always have a next value. Now try it scanning a fixed array or file. -- BenTilly

: I think the point of the sentinel is that it denotes the last value, like iterating over a 'C' string with the '\0' character being the sentinel. -- AlanDavies

Ok, so I missed a case out (because I was trying to use if so more people would understand it). The correct code is:

  (let loop ((var init))
(cond ((equal? var sentinel) 'done)
  ((equal? var last-value) 'done)
  (else (loop (next-value)))))

Notice the original imperative pseudo-code doesn't check for loop termination either.

I did notice. Alternately you can say that an exit is embedded in the reading section. Either way the fact that you have 2 exit conditions is part of why it is natural in an iterative solution to have one of them be an internal loop exit. An ugly feature of this kind of control problem is that end of data is returned as data, which either means that you have a language special value to return, or else you introduce a SemiPredicateProblem?. -- BenTilly

What is important is writing clear code. In all the examples given above, the code is clear both with and without internal exits, due to its brevity. The prohibition against internal exits comes from the good old days when people routinely wrote 50-line functions with several levels of nesting. Having a internal exit buried deep inside such a function is harmful.

With more modern stylistic rules (shorter functions, iterators, exceptions, functional programming style, etc.), prohibitions against goto and other "non-structured" constructs are not as necessary. BadProgrammerConsideredHarmful? is the modern principle.

The important thing to remember about rules like "Never use GOTO", "Never use internal exits", etc., is that TheyreJustRules. You don't have to follow them blindly. If you don't like them, then ignore them or amend them to suit you.

Agreed. Except that I find it dangerous to break rules without a good understanding of why the rule became a rule in the first place. -- BenTilly

I would like to argue in favor of GotoConsideredHarmful. This is the same search from the paper in C using multiple returns:

  int search( int [] list, int n, int key ) 
  {
for ( int i = 0; i < n; i++ ) 
if ( list[i] == key ) return i + 1;
return 0;
  }

And the same search from the paper in C without using multiple returns:

  int search( int [] list, int n, int key )
  {
bool found = FALSE;
for ( int i = 0; i < n && !found; i++ ) 
if ( list[i] == key ) found = TRUE;
return found ? i + 1: 0;
  }

It has a possible bug, since i can be incremented or not when it goes out the loop... The following is possibly better:

  int search( int [] list, int n, int key )
  {
int ret = 0;
for ( int i = 0; i < n && ret != 0; i++ ) 
if ( list[i] == key ) ret = i;
return ret == 0 ? 0 : ret + 1;
  }

Further evidence that InternalLoopExitsAreOk - the above version also has a bug: if list[0] == key, it will skip it and keep looking. Maybe this is better:

  int search( int [] list, int n, int key )
  {
int ret = 0;
for ( int i = 0; i < n && ret != 0; i++ ) 
if ( list[i] == key ) ret = i + 1;
return ret;
  }

The paper states that having multiple returns is a requisite for clearer and less buggy code. As you can see, the paper is biased against GotoConsideredHarmful.

The first version was indeed clearer, and keeping track of an extra variable is a potentially bug-causing hassle (in more complicated code). IMO, the basic rule is: people think linearly, anything else involves mental contortion and thus causes bugs or at least slows you down.

I don't understand why the paper uses this example to draw this conclusion. I think this is the clearer version

  int search( int [] list, int n, int key ) {
int i=0;
while(i<n && list[i] != key) {
i++;
}
return i==n ? 0 : i+1; 
  }

Perhaps the Answer is None of the Above?

[9/9/04 - These are some thoughts that came to me after reading this page. I have not used this in actual code to really evaluate how I might feel about it, but I thought it might lead to some interesting discussion.]

Aren't all of the above really different ways to implement internal loop exits? What if we eliminated early exit from the loop completely? Perhaps all we have is a premature optimization, a commonly used one, but is it worth the cost?

The proposal is that we use a loop to capture all possible matches in the original collection and return a subset collection. The loop always runs to its conclusion and we only add matches to the subset collection. Subjective advantages include:

The return does not have overloaded meaning. The return does not mean either "No Match Was Found" or "The Value Of The Match Is." Instead, the size of the returned collection indicates whether any matches were found.
There is no need to try and interpret the loop variable outside of the loop; the variable stays within the scope of the loop and does not imply success nor failure.
There is no implicit assumption of only one match. If multiple matches are found, the caller can decide what to do, including working with all found matches, working with the first found match, working with the last found match, or throwing an error.
Successful completion of the loop implies successful completion of the operation, while early exit from the loop implies an error condition. This is the opposite interpretation than what is implied by "traditional" early exit loops.

The disadvantage to this approach is that we unnecessarily spin through the original collection after a match has been found. Yeah, this one still bugs me, but intellectually, I wonder whether the extra time spinning through the loop really makes any difference in the program response time. Remember, in the case of no match, the execution time of this approach and all three previously proposed alternatives will probably be almost identical.

I haven't tried this approach out in code yet, but I am curious as to how others might react to it, so go at it!

-- WayneMack

There are examples where spinning through the entire collection will take years, but an early exit reduces the waiting time to days.

Yes, I agree there are times when performance optimization is necessary, particularly when the loop is running on a non-memory resource such as a file, a database, or perhaps a list of remote computers. I guess my question is really two-fold, "Does the number of cases where the performance improvement is not necessary outweigh the number of cases where it is necessary?" and "Is the code resulting from this proposal really less 'ugly' than the more common alternatives?" Would we benefit from using a non-early exit as a standard approach and only use an early exit approach when performance gains are necessary? It's probably just going to be a subjective evaluation. -- WayneMack

There is also the deeply embedded programming world, where the cost of building and returning a list of matches is too high. --DougKing

This is actually the type of blind assertion I am questioning. The cost, in terms of CPU cycles, to complete processing of an array after the first match found relative the cost of finding (or not finding) the first match is very data and application specific. The maximum CPU cycle cost is unchanged, it is only the average cost that is being minimized. There is also a memory cost in returning a buffer rather than a pointer. There is no doubt that there is run time cost associated with this approach; that, however, does not imply the cost should be hand optimized out in every (or even most) instances.

The longer term implication of this approach is that programming languages could begin to define a set of primitive operations for collections and, except where performance optimization is found to be necessary, programmers would not have to write another for loop again. In my experience in embedded coding, I've noticed there are lots of times where performance is traded off for ease of coding.

If we could eliminate the discussions and concerns with starting at 0 or 1, indices versus iterators, whether to use flags, return statements, or exceptions to exit a loop, etc., would the performance cost be that significant? I find that most of the loops I write are highly similar and I often copy and paste with an edit of only part of one line of code. Why can't a language and compiler take care of this repetitive code for me? Can we just make the for loop go away like linked lists, and only rely on it in limited circumstances? -- WayneMack

Depending on the system, the performance cost could easily be that significant. In the embedded systems I work on, maintaining a buffer (not to mention using the buffer in the calling code) would usually be more expensive and more complicated than the search itself.

It also seems that you are solving a fundamentally different problem from the rest of the page, where the caller only wants one match. In many cases, there can only be 0 or 1 matches, and using a buffer in these cases is not helpful and is in fact an unnecessary complication.

This approach solves the same problem, it merely separates the concerns of find matches from selecting the first match found. While I will agree there is a higher cost in this approach, I am not certain that implies additional complexity.

[This solution increases algorithmic complexity, code size, memory usage, and reduces performance in the name of purity of a contended principle. I don't like it. Now, a search function that did behave this way is sometimes useful (and I've written and used them), but certainly it should not be your off the shelf solution. You'd use it when you need to search a set and return all the duplicates. In addition, code complexity in many languages is significantly increased (for example, in C/C++ you need to worry about memory management for the returned array, callback functions for cleanup, etc, etc, etc). --ChrisMellon?]

While I don't have a problem with mid-loop exits, I do have a problem with multiple points of return.

  Why?

I'm not Garry, but I suspect it has to do with resource de-allocation. It's far too easy to return from the middle of a function and forget to close a file or free a chunk of memory (in those languages where the programmer gets to manage the FreeStore?)
Yeah, things like resource handling, flags/semaphores, state restoration, and all that. I've been burned here like (I imagine) everyone else. The cost is low, the bennies are worth the cost. -- gh

My functions have one entry, one exit (return), while loops therein may have breaks/exits within loops - even multiple breaks within loops. But when the loop breaks, it always falls through to the (single) return for that function.

The number of times (even in embedded stuff) that I have to break this rule is one-hand-countable. The penalty in execution time has never been a problem, and the code is far more maintainable.

-- GarryHamilton

You might try the FunctionWrapper pattern

Yes, I use this pattern. Still, I avoid multiple returns. It is as much a maintainability thing as anything else. I even do this in AssemblyLanguage, though wrappers are quite useful in low-level stuff.

I agree with most of the above comments. I have a few common tricks I use for the problems brought up when using multiple returns and function wrapper pattern. I like multiple returns because it gives me a nice way to avoid Indentatiosis when I have many if's that all involve necessary conditions:

if (condA) {

  if (condB) {
    doSomething(); /* running out of space in complex nesting */
  }

}

Just invert the condition and think of it like a filter-chain.

if (!condA)

  return;

if (!condB)

  return;

doSomething();

I actually enjoy working with these more than ifs so I use them when I can. I'm not stingy with the whitespace on these.

For the resource deallocation bit, I usually make a

cleanup:

label inside a function with complex deallocation needs and then use a few gotos up above. It's clear enough for tough situations; I don't consider it a sin to use gotos this way, as everybody recognizes the "two threads flow apart then back together before returning" motif as a classic. You see the same thing with error cases a lot, and the idea is simple to state more generally: A goto is meritorious if all gotos in the same function definition share the same target. This simple star pattern is easy for everybody to grasp and so deserves to be used more often. In more normal situations I do this kind of thing:

struct Holder { void *ptr; } void freeBlock(struct Holder *h) {

  assert(h && "Error tried to freeBlock for NULL Holder");
  if (h->ptr) { free(h->ptr); h->ptr=NULL; }

}

The point here is that the pointers are always NULL'd to ensure that deallocation happens only once. So adherence to this discipline with all deallocation leads to a capability to return early and still do (a potentially incomplete sequence of) deallocations in another function without complex interleaved logic chains. This pattern actually makes it pretty easy to convert most any pointer into a singleton or a buffer ring, which is another useful move to have easily available for the future. I think the best reason I like it is because it simplifies buggy crashes (less complex behavior with less non-zero state) and lets you catch "already freed" type errors with simple assert(ptr != NULL) type things much of the time. If you combine this technique with a ReligiousAdherenceToCallocOverMalloc?, then you get the capability to detect "not yet allocated" type errors with that same assert. Another pragmatic trick I use (last resort) when that fails is atexit() for this same problem.

-- RudiCilibrasi

[Discussion moved from RefactorMatchLoopToUsage]

Use the right type of loop. For loops with Exit constructs are not for loops at all. In this example, the programmer does not want to execute the loop for all numbers between 1 and 10, but only while something is not found. The second loop communicates much better.

from...

   For i = 1 to 10
      If isFound(i) Then Exit For
   Next i

to...

   ysnFound = False
   i = 0
   While (not ysnFound) And (i < 10)
        i = i + 1
        ysnFound = isFound(i)
   Wend

(you could do this with Do .. Loop Until as well)

I think the first loop is easier to read and communicates more clearly. (In particular, the second loop looks at first glance as if it's looping from 0 (inclusive) to 10 (exclusive), but it isn't. This isn't the only reason why the second form is less clear to me.) It's a tradeoff, of course: the programmer, contrary to the assertion above, wants to do something for values in a particular range, until a condition is met. You can choose to prioritize one or the other; in this instance, making the for iteration the primary one results -- in my opinion -- in code that's clearer as well as shorter.

Since what's being done in that particular loop is a common idiom, it would be even better if you could say something like

   i = findFirst(1..10, isFound)

instead. This is not practical in VisualBasic, for a variety of reasons: so much the worse for VB. (There are a bunch of not-quite-trivial design issues associated with a function like the one I propose. What to return if no matching element is found? Return the element or its index? What about having multiple sequences and a single predicate applied to elements from all the sequences together somehow? Should there be a findLast too, or a way of making findFirst work backwards? And so on. I commend to the reader's attention the FIND-IF and POSITION-IF functions in CommonLisp, and their associates.)

Oh, and while I'm causing gnashing of teeth among those to whom all mention of Lisp is anathema, let me mention that CommonLisp offers a looping construct in which you can avoid privileging either the for or the until aspect:

   (loop for i from 1 to 10
         until (isFound i)
         finally (return i))

I think that's rather elegant. -- GarethMcCaughan

In AlgolLanguage you could say

   do for i := 1 to 10 while not isFound(i)

but this syntax was regarded as "overblown" at the time. Pity.

CategoryCodingConventions CategoryCodingIssues CategoryLoops