Smalltalk Blocks Are Thunks In Disguise

One of the (maybe not so) dirty secrets of SmalltalkLanguage is that its block construct is a cleverly-disguised, OO friendly version of a thunk. WhatIsaThunk? In general terms, it's a mini-procedure (or a pointer to same, generally with no arguments) which is passed to a function; when the function wants to retrieve the argument in question it evaluates the thunk and uses the returned value as the argument. Thunks were introduced in AlgolSixty as the implementation technique for CallByName--a nowadays-controversial (and often ConsideredHarmful) parameter passing mode that Algol-60 included in order to allow functions to simulate macros. As macros (especially the simplistic, text-substitution macros of that era, and of certain languages invented by the phone company) by their nature have NormalOrderEvaluation (meaning that arguments are not evaluated until used; and re-evaluated multiple times if used multiple times), thunks were the only reasonable implementation mechanism.

AlgolSixty, whose CallByName "feature" was notoriously difficult and error-prone to use; perhaps have given thunks a bad name. CallByThunk, after all, is the most general parameter passing mode there is--it can do anything that CallByValue and CallByReference can do; and it can do things--like LazyEvaluation and NormalOrderEvaluation--that the other two modes cannot do (at least not without explicitly passing in functions to perform the deferred evaluation--in other words, CallByThunk). CallByThunk is much slower of course; you have the overhead of two jumps for each argument handled in this manner (one into the thunk, one back out)--but it's the most general. That said, most languages are designed to avoid thunks; and they are used as a last resort.

Which brings us to Smalltalk. Smalltalk is unusual among programming languages in that its control structures are implemented as messages (in other words, functions) rather than as compiler built-ins. Consider the two "minimal" control structures need for StructuredProgramming--selection (if/then/elif/else) and looping (for, while, etc.) The former requires LazyEvaluation to be semantically correct--an if statement that evaluated both the true and false cases (even if returning only the "correct" value) would be virtually useless. The latter requires NormalOrderEvaluation, as the body of the loop may be executed n times, where n is a non-negative integer.

SmalltalkLanguage, like most modern languages (excluding explicitly non-strict FunctionalProgrammingLanguages like HaskellLanguage) features StrictEvaluation of function arguments--each argument is evaluated exactly once, before the body of the function. The ifTrue: ifFalse: method strictly evalutes both arguments (the true and false part) before the body is executed. Likewise for the timesRepeat: method on an integer; its argument is executed once before the body.

Which violates the required semantics for selection and looping constructs. So what to do?

Enter the Smalltalk block. By enclosing a bunch of expressions in [], you create an object which responds to the "value" message; it is a pointer to this block object that gets passed and evaluated. Evaluating the pointer does nothing. The body of the true ifTrue: ifFalse: then sends the value massage to the true side; false ifTrue: ifFalse: does the opposite. timesRepeat: repeatedly sends value to its argument.

By this arrangement--of sending a message (calling a function) to cause the data/expression referred to by the block object to be generated--Smalltalk is essentially simulating CallByThunk. Unlike manual thunk-writing in most procedural/OO languages, which is a pain (it's easier in FunctionalProgrammingLanguages, where a thunk can be easily made with a lambda), blocks in Smalltalk make it convenient and simple.

(um.. Blocks are lambda's)

[Blocks are more than just lambdas; they are full-fledged objects. Of course, where you draw that particular line is an interesting question; lambdas with variable capture are largely equivalent to objects; object systems formalize a bunch of additional stuff. Most OO languages use blocks, or other sorts of FunctorObjects, to model lambdas; some even subsume all functions under the "object" label. An AbstractionInversion, to be sure; but one which doesn't bother me at all...]

Exactly, functional programmers like to say their functions are first class objects, which makes perfect sense to me. Lambda's are basically full fledged objects too, they have private variables. They are just anonymous, classless objects. You don't need a class to have an object. Objects are just a list of lambda's, either way you look at it, it makes me happy.

SmalltalkBlocks? have two other features necessary for them to be used in this manner. First, they have LexicalScoping, and can do VariableCapture? on their surrounding scope(s); functions do not normally nest in Smalltalk. Second, a return from a block returns from the enclosing function, rather than returning from the value: message itself (transferring control back up to the function that sent the value message to the block)--this is necessary to simulate early exit in loops, or if (x) { return; } constructs.

In short, Smalltalk blocks are a clever and OO-friendly way of implementing thunks. Which isn't a bad thing, of course... but if such thunkery weren't in place and so convenient; Smalltalk would need to do like most other languages do--and implement its low-level constructs as special forms. While that would most likely be faster, it wouldn't have the flexibility that Smalltalkers love about their favorite language.