Tail Call Optimization

[Refactored to/from TailRecursion]

Tail-call optimization (or tail-call merging or tail-call elimination) is a generalization of TailRecursion: If the last thing a routine does before it returns is call another routine, rather than doing a jump-and-add-stack-frame immediately followed by a pop-stack-frame-and-return-to-caller, it should be safe to simply jump to the start of the second routine, letting it re-use the first routine's stack frame (environment).

For example, in CeeLanguage:

 int bar(int a)
 {
     printf("bar called with arg %d\n", a);
     return a * a;
 }

 int foo(int b)
 {
     return bar(b * b);
 }

Most compilers will generate suboptimal code for foo, calling bar and then returning immediately. A "smarter" compiler would simply multiply and GoTo the start of bar.

However, TailCallOptimization has some drawbacks. The runtime environment will have a confusing stack during execution of tail-called routines, which can make debugging difficult ("How did I get here? baz() never even calls foo()!"). Also, CeeLanguage-style automatic variables may not be safe for TailCallOptimization:

 int bar(int *b)
 {
     return *b * 10;
 }
 int foo()
 {
     int a = 5;
     return bar(&a);
 }

If bar() steals foo()'s stack frame, then a has no valid storage associated with it anymore and bar() may act unpredictably. On the other hand, bar() may do something like tuck the pointer away for later use, so using an automatic variable this way may not be safe anyhow.

See http://lambda-the-ultimate.org/classic/message1532.html for a discussion of issues that make tail call merging generally difficult in CeeCeePlusPlus. HenryBaker's paper CheneyOnTheMta describes a technique to efficiently imlement tail call merging when compiling SchemeLanguage code into the CeeLanguage.

[From TailRecursionElimination:]

TailRecursion elimination is a special case of TailCallOptimization where the tail call is to the function itself.

TailRecursion is the property of a method (or function) that has recursion as its final operation before returning. In other words, the last thing the method does is call itself.

Broken example:

 int factorial( int n )
 {
return( n < 2 ? 1 : n * factorial( n-1 ) );
 }

This is not TailRecursion, because the multiplication happens after the recursive call to factorial.

Fixed example:

 int factorial_accumulate( int n, int accum )
 {
        return ( n < 2 ? accum : factorial_accumulate( n - 1, n * accum ) );  // tail recursion
 }
 int factorial( int n )
 {
return factorial_accumulate( n, 1 );
 }

TailRecursion elimination is something a compiler can do to transform a TailRecursive method into a non-recursive, iterative method:

 int factorial_accumulate( int n, int accum )
 {
while ( n >= 2 )
accum *= n--;
return accum;
 }

Or further optimized:

 int factorial( int n )
 {
int accum = 1;
while ( n >= 2 )
accum *= n--;
return accum;
 }

When I first learned about recursion, it seemed useless; the loop method worked just as well, and was less confusing. But that was before I learned about tree-organized DataStructures (TreeStructure ?). When a subroutine calls itself twice (once for the left branch, once for the right branch), it's far more difficult and confusing to convert it to a looping version.

So I find tree structures better examples for learning recursion.

Is it possible to do tail recursion elimination when a subroutine calls itself twice? -- AnonymousDonor

Not entirely. You can only do tail recursion elimination when the call occurs in a tail position, and by definition, you can't have two calls in tail positions (well, you can in conditionals, but then it's one tail-call per branch). But you can do tail recursion elimination on the second call, so you only have to recurse down the left branch and can iterate down the right. In many LispLanguage programs, the right branch is often much deeper than the left, so this can be a big win. -- JonathanTang

I'm a little confused: theoretically couldn't you do this "tail call optimization" on any operation of the form "return f(x)", regardless of tail position or not? This is, of course, only relevant in languages that support multiple returns. But after all, if the last thing the function does before returning is call "f", then couldn't it always just pop the current stack (somehow saving the argument struct x - I'm not an expert) and then jump to f? Is the problem that one can only save one argument struct, and you'd have to know at the start of the function which struct to save? (yes, I know that the arguments aren't literally a struct, but it's conceptually easier to think that way, and the word "Tuple" is horrifically overloaded)

When you invoke "f(x)" with "return f(x)", it is in a tail position. Tail position doesn't mean it's at the end of the function definition. It means that it will be the last thing the function does. So if it is part of a "return" statement, then it is by definition the last thing the function does.

Many compilers convert

  ...
  call AppleSubroutine
  call BananaSubroutine
  return

  ...
  call AppleSubroutine
  jump BananaSubroutine

If this is *inside* BananaSubroutine, it's the same as TailRecursion elimination, right ? -- AnonymousDonor

It's the more general TailCallOptimization. If this is in BananaSubroutine, then yes, it's TailRecursionElimination. -- jt

Discussion:

This is a bit confusing to me. Isn't it the case that any recursive procedure can be re-written in tail-normal form? And if a procedure is recursive, then it evaluates itself, not some other arbitrary procedure. Not sure I agree with the "converts ... into loops" either. Tail call optimization does make recursive procedures evaluate in constant space, however. And shouldn't it be "environment", not "stack frame"?

It isn't true that any recursive procedure can have all its recursion reduced to tail recursion, unless you do it by use of ExternalizeTheStack (or use ContinuationPassingStyle which more or less amounts to the same thing).
Tail-recursion is when the last thing a procedure does is to call itself. Tail-call is more general: it's when the last thing a procedure does is to call something (which might or might not be itself). You can do pretty much the same with general tail-call as you can with tail-recursion.
The "converts into loops" remark is perfectly reasonable. Tail call equals variable renaming plus goto, after all. There are language implementations that will produce the exact same machine code for a tail-recursive "iteration" as for one explicitly written out using looping constructs.

One other remark: if your language implementation does TCO nicely, then a very nice way to implement a state machine is as a bunch of mutually tail-recursive procedures. (Languages that let you do this sort of thing usually also allow those procedures to be defined in a non-trivial lexical environment, so they can share state without it needing to be global.)

ColorForth guarantees TailCallOptimization. ColorForth simply converts "CALL sub; RET" to "JMP sub". Like Scheme, ColorForth encourages the use of TailRecursion to implement iteration.

It is (or can be) a bit more complicated than that though; consider a procedure consisting of only a two way branch to two different procedures. The calls aren't guaranteed to both be the last statement of the procedure (they could share the same 'RET' by use of a jump), but tail-call optimizations can and should still apply. It's close enough to work in nearly all situations however.

ColorForth gets around this problem by eliminating the two way branch. :) The normal Forth word:

 : max ( a b -- max ) over over < if nip else drop then ;

is written in ColorForth as:

 : max ( ab-m ) over over - drop -if nip ; then drop ; \ see the notes below.

which in C would be:

 int max(int a, int b) { if (a<b) return b; return a; }

It is true that the ColorForth programmer must write the tail call as "word ;" in order to be optimized. "word noop ;" is specifically not optimized.

(Note 1: ColorForth's IF and -IF are non-destructive, using the CPU zero and sign flags instead of top-of-stack, hence the "drop -if".)
(Note 2: ColorForth uses color transition instead of ';' to end a definition. ';' compiles to a literal RET or the aforementioned TailCallOptimization.)

See TailRecursion

CategoryOptimization