Moved from ImplementingLisp.
One of the most fundamental implementation issues is that of encoding type in each cell, so that each member of a pair is self-describing. Typically several bits (a small number of bits, usually 1 to 4) are devoted to type, so that e.g. 00 means small integer, 01 means pointer, etc. Floating point, strings, and non-primitive value types are often "boxed": represented by a pointer into a special allocation area; the pointer itself encodes the type (a float pointer has LOW_FLOAT_ADDRESS <= float_pointer <= HIGH_FLOAT_ADDRESS).
The exact encoding method used is typically architecture dependent, since it must deal with little-endian versus big-endian, whether low bits of pointers are guaranteed to be zero (on word-addressed machines), etc, and requires some creativity.
BoxingConversions are elegant from one point of view but leads to efficiency concerns with floating-point intensive code.
Also see DavidGudeman?'s review paper "Representing Type Information in Dynamically Typed Languages" (ftp://ftp.cs.indiana.edu/pub/scheme-repository/doc/pubs/typeinfo.ps.gz).
It is interesting and of non-trivial importance that Lisp's original implementation, and its most popular 1970s implementation on DEC 10s, ran on machines in which exactly 2 pointers fit in each memory word.
Lisp these days works quite well on machines where 1 pointer fits in each memory word, of course, but this is less of a perfect fit.
CeeLanguage programmers, by contrast, have always tended to assume that sizeof(pointer) == sizeof(int) == sizeof(word in RAM); that assumption wasn't forced by C, but many kinds of programming are more awkward on machines of other sorts, and require more careful C programming to be portable.
The implementation of lambda has varied wildly in different Lisps over the decades. The simpler approaches represent it the same way that any function is represented, the only difference being whether some name refers to it or not. It is possible to implement it as a sexpr that exactly reflects the way it was specified. Implementations stray from that due to issues concerning the binding of the name parameters to activation record values, amongst other concerns.
CactusStacks tend to be natural in order to implement LexicalScoping, if nothing else.
There are many resources about how to implement Lisp. I would recommend to read ChristianQueinnec's book 'LispInSmallPieces'. http://www-spi.lip6.fr/~queinnec/WWW/LiSP.html
Also: 'An Introduction to Scheme and its Implementation': ftp://ftp.cs.utexas.edu/pub/garbage/cs345/schintro-v14/schintro_toc.html
I highly recommend the out-of-print classic "AnatomyOfLisp?", 1979, JohnAllen?. I can't say exactly how far down in a list of must-read books it should be placed; there are many I haven't read. But everyone seems to still agree that it is unique. -- dm
'Anatomy of Lisp was once a classic. Now it is mostly outdated. It describes an ancient dialect of Lisp (with dynamic binding) and modern implementation techniques are a bit different to say the least.'
That's not accurate. It is certainly outdated on some topics, so it definitely is not a modern "how to implement lisp" book. However, it is not "mostly" outdated; Lisp's fundamental nature has not changed. For instance, studying dynamic scoping techniques is a positive thing, not a negative, despite the modern emphasis on lexical scoping.
'Actually a lot of Lisp's fundamental nature has changed from the time Anatomy of Lisp has written. Much of the modern Lisp trends started with newer Common Lisp systems and before that with developments in the Scheme world.'
Also, I am not aware of any up-to-date book that serves as an outright replacement for Anatomy. Please let me know if you do know of one, because of course I want to read it.
As far as I know, the list of must-read foundational Lisp books is all too short to start with.
And as Jonathan says below about JohnMcCarthy's original paper, Anatomy is good for historical context even in the areas where it is outdated. It is still a classic.
It irritates me no end that people lose interest in everything but the cutting edge; there is tremendous value in history. "Those who cannot remember the past are condemned to repeat it." (George Santayana)
-- DougMerritt
Yes, but we'll repeat it much more efficiently, because we have TailCallOptimization. -- GeorgePaci
Did you see the references above? The modern replacement is 'LispInSmallPieces'. Anatomy of Lisp is hard to get (giving a reference to an out-of-print book will give the potential reader not much benefit - I would really advise to read something that's either online or buyable with some added material - like source code to study - see: Lisp in small pieces) and hard to read (yes, I have a copy) and outdated. Sure you can study old (very old) Lisp implementation techniques, but the newer books and online resources are preferable. As a first reference I would not give Anatomy of Lisp. Maybe to get a historical perspective. But that's about it.
Considering I said "I can't say exactly how far down in a list of must-read books it should be placed", I don't see why there should be any objections whatsoever. I also said "I am not aware of any up to date book that serves as an outright replacement", and I haven't heard that contradicted, since I think I was clear that the older techniques would ideally be covered as well as matters of newer importance.
That lexical scoping is more difficult to implement or that that matters is a myth. Proof: tons of small Scheme implementations.
I really don't think it's a myth, having personally implemented both kinds. Dynamic scoping is trivial. Lexical scoping at one point was actually considered impossible/impractical in Lisp, did you know that? Given all the research and implementations since then, it is now no longer anywhere near as difficult...but it remains noticeably more complex. Hugely complex? No, nor did I say that, so "tons of small Scheme implementations" doesn't "prove" anything one way or the other.
And it is quite common to hear elisp fans say that its dynamic scoping just isn't the big issue in practice that other Lisp folks say that it is -- I prefer lexical scoping, don't get me wrong, but dynamic scoping isn't a dead issue. Like I said, it is interesting for pedagogical implementations, for especially tiny languages, etc.
I'm sorry that you're disappointed in it, and I apologize if my own previous recommendation gave a misleading impression, but just because it isn't what you personally were hoping for doesn't mean that it isn't a classic.
I do agree that parts of it are difficult to read.
P.S. as for out of print, I did attempt to do my part on that: I have talked to both JohnMcCarthy and to John Allen in recent years about getting a revised second edition back in print. Doesn't sound like it's going to happen. :-)
-- DougMerritt
It occurs to me that, perhaps Anatomy would be a much less controversial suggestion if it were in a separate list of historically significant classics that any really serious implementor should eventually read, along with JohnMcCarthy's 1960 paper and the Lambda the Ultimate series? But if even that doesn't satisfy, then sorry, I'll just have to disagree. -- dm
'No, that's correct. Anatomy of Lisp is a classic. Lambda the ultimate series is also worthwhile to read. Other books would be 'The architecture of symbolic computers' (Kogge), 'Topics in Advanced Language Implementation' (Peter Lee), 'The Lambda Calculus' ;-), the various conference proceedings of Lisp and Functional Programming, Smalltalk 80 -- The language and its implementation, The Art of the Metaobject Protocol, various papers on macros, Efficient method dispatch in PCL, Rabbit -- a compiler for Scheme, Orbit -- an optimizing compiler for Scheme (Phd thesis) Internal design of CMU Common Lisp (Technical Report), Lisp machine manual, a multitude of papers on GC, and so on... Here are some more pointers: http://library.readscheme.org/page8.html'
Absolutely... although, as page 8 of a series of important references, this is something of an embarrassment of riches, and I may never catch up on reading all of them. I doubt if I've read even 20 of the papers listed on those pages.
You've given a nice list, but I wonder if it could be separated and prioritized, especially now that we're distinguishing between historical and modern, core/critical versus "it would be nice" , etc.
Another classic is "The Semantics of Destructive Lisp", although this is very focused/specialized, and not of interest to everyone by any means.
There's also the 1996 book that surveys the subject of garbage collection, "Garbage Collection -- Algorithms for Automatic Dynamic Memory Management", by Richard Jones and Rafael Lins (see GarbageCollectionBook). Although research papers continue to be written about GC, I'm not aware of any other book of this sort, so I would consider it a definite must-read for a serious implementor.
Contrary to some claims, it really isn't the case that there is just one single GC algorithm that is always best. It depends on the system (e.g. expected average free memory) and the goals.
"Advanced Programming Language Design" by Raphael A. Finkel is, unfortunately, apparently a one-of-a-kind ('though not Lisp-specific), and usually well-thought of by people designing new languages.
-- DougMerritt
CMUCL, SBCL, and CLISP are all open-source implementations. CLISP is bytecode-interpreted. SBCL is supposed to be an easier-to-read-and-maintain version of CMUCL. The CormanLisp? compiler also comes with source that you can browse through, though it's not open source in the FreeSoftware definition.
Also see JohnMcCarthy's original paper (http://www-formal.stanford.edu/jmc/recursive.html). This shows how the original 1959 Lisp was implemented. It's probably not at all relevant to current techniques, but it's good for historical context.
I'd also like more information on this, as I'm close to beginning implementation of a Lisp-like language that has many of the same challenges as Lisp. Here're some things I've learned from various papers (Orbit, LispMachine), newsgroup postings (Allegro) and implementations (GwydionDylan, Chicken Scheme, Gambit Scheme, and CormanLisp?).
Parsing is as simple as you'd imagine. Top-down recursive descent, call READ whenever you see an open paren, recursively read symbols. CommonLisp also allows reader macros, which let you execute a bit of code whenever you see a certain symbol. That's how ' and #' are implemented.
I assume you're asking about LexicalClosures when you say lambdas, because a simple lambda is just a function pointer to a compiler-named function. A closure is a data structure that contains both the function pointer and a pointer to an environment. The environment doesn't always have to be implemented on the heap, CactusStack-style. Highly-optimizing compilers (Orbit comes to mind) will try to statically analyze the closure to determine its extent, and if it doesn't escape, they'll just put a pointer to the base of the enclosing stack frame. This is the same mechanism that Algol and Pascal use, and appears in most compiler textbooks.
If the closure does escape, the compiler often doesn't need to store a full environment frame on the heap. Instead, it can analyze the free variables in the closure, and just store a data structure containing them. The enclosing function will also need to point into this structure (instead of allocating the variables on its own stack frame).
There's a third technique, called "LambdaLifting", that's often used in more purely-functional languages. In this, the function takes all its free variables as parameters instead of taking a pointer to an environment. This won't work if the variables are assignable, because lambda invocations won't share closed-over variables like they do in normal lambda semantics. But it saves an indirection and is as fast as normal function-calling, so you can see why it's useful in pure-functional languages. Read the papers on this for more details (I think SimonPeytonJones has done a bit of work on it); I'm probably butchering it.
Lists are not always lists! The LispMachine used a technique called "CDR-compression", where a list that's allocated all at once (as opposed to being consed step-by-step) would occupy a contiguous area of memory, like a vector. CDR then just moves to the next memory cell instead of following a pointer. There was a special tag bit on the list header that indicated a compressed list, and if you did a rplaca on the list, it'd be broken out into the normal format. I don't know of any implementation since that has done that; checking the tag may be prohibitive without the special hardware.
Many high-performance Lisps (CMUCL, GwydionDylan, and SBCL, at least) also do type-inference as an optimization. With suitable type declarations, they can often pin down the precise type used in a function, which lets them get away without tagging or boxing of values. This can make a big difference in numeric code, and is one reason why CMUCL can often be faster than C/C++ for numerics.
-- JonathanTang
It would make an interesting contest to see the shortest C implementation of a simplified Lisp dialect. Ignore speed for the initial contest. The winner would receive something like an autographed photo of JohnMcCarthy.
http://www0.us.ioccc.org/years-spoiler.html#1989_jar.2
If you're interested, I have an old non-standard lisp-in-java implementation that I can exhume (haven't seen it in three years) and post up on the wiki as a catalyst for explanation and discussion. I seem to recall it being quite modular and easy to explain. It would end up spanning a rather large number of pages, though, for sufficient explanation to justify putting it up at all, and perhaps the interest doesn't justify the pollution? -- AdamBerger
I've been kicking around an idea of using CeeAsAnIntermediateLanguage without having to resort to something like CheneyOnTheMta by using a few bits of assembly that work in the GNU assembler on all platforms, and thereby avoiding the overhead of C function calls entirely. The basic idea looks something like this:
#define LABEL(X) asm(".globl " #X ";" #X ":") #define GOTO(X) do { extern void *X; goto *(&X); } while(0)Functions would look like this:
void funcname() { LABEL(funcname_start); /* ... */ GOTO(elsewhere_start); }I've already discovered that any optimization in gcc breaks this (it sees no need to pop arguments to functions off of the stack after calling them when it is about to return and restore the previous value of the stack pointer anyway). It would be necessary to avoid doing any gotos from blocks with automatic variables (this means no autos in functions), but that's easy in auto-generated code. I wonder if the costs of losing optimization would outweigh the benefits of avoiding unneeded function call overhead, or could that be fixed by optimizing before compiling to C?
There is still a little overhead above a straight assembly version since on x86 the goto compiles to:
movl $label,%eax jmp *%eaxinstead of:
jmp $labelbut I'm thinking that the cost of an extra, non-memory referencing, instruction is tiny compared to the cost of a function call.
Has anyone else done this before? Is there any conventional wisdom on this sort of thing?
Not sure, but it sounds like it might ultimately be more trouble than CheneyOnTheMta.
You may not care, but I would think that "goto *(&X); " is allowed only by GNU extensions, and wouldn't work on other compilers.
You may also be interested in taking a look at CeeMinusMinus - a C dialect intended as a high-level assembly language (rather than being called that in a derogatory way).
Doesn't GCC already optimize tail calls nowadays?
Anybody want to volunteer a relational schema of parsed Lisp? Don't worry about field types or lengths at this point. -- top
Please STOP changing "--" to "-". This is simply wrong. The "--" represents a "dash", which is different from a hyphen. When I say "it is dead-wrong", I correctly use a hyphen. When I say, instead, "it is wrong -- you are pessimizing pages", I correctly use a dash.
It's not "wrong", just a matter of what is chosen to represent a dash; "dead wrong" is usually two words.