Lisp Implementations Written In Lisp

[Moved from CommercialLispApplications]

[A lot of excellent commentary appeared here; I have removed my original controversial claim and my defenses of it, in preference to this other material, and did some other minor refactoring. Some further cleanup would probably help, feel free. -- DougMerritt]


Most Common Lisp implementations are written in Common Lisp (with perhaps 10% in C and a little assembler, for things like OS interface), e.g. CMUCL and SBCL.

CMUCL has 23,000 lines of C in src/lisp, including 2K lines for a garbage collector.

OpenMCL for example can be bootstrapped with just a C compiler and then it will build an initial Lisp. This will be used to create the full Common Lisp. (The wording here is confusing.)

I believe this means that OpenMCL is built as follows:

Some of OpenMCLs internals and build process are described here: http://openmcl.clozure.com/FTP/mcldoc/mcldoc.pdf , http://openmcl.clozure.com/Doc/kernel-build.html , http://openmcl.clozure.com/Doc/lisp-build.html

SBCL needs another Lisp (this could be an existing SBCL some other Common Lisp like Clisp or OpenMCL) to compile itself.

Usually only a few things are written in C and most of the language including the (native) code compilers are written in Common Lisp.

Usually not just the "library" is written in Lisp. But most of the core data structures, garbage collection, the compiler(s) etc. are written in Lisp.

Often some glue to the underlying OS (which is usually written in some C) is written in C. C is not used because it is , but because it is the natural choice, since the OS is written in C and you would not write the glue code in Visual Basic or Ada.

The hosting operating system's memory management is completely different from that of Lisp, and that has a lot of consequences. Data is typed. The differences start here. Now you have a different low level structure of simple data structures like numbers, characters and strings. These are not easily exchangeable with the OS, which usually does not provide typed data (only if you have a Lisp OS). Second there is memory organization in generations for the generational GC (which is often used). Third there is the GC itself. Then Lisp provides a different exception system, different pathnames, ...

Because of all this, it is often considered expedient to write a layer between the OS and the Lisp implementation in C, to buffer that mismatch. There needs to be a runtime component that provides all these infrastructure. Some parts of this runtime is best written in C, since C is supported on essentially all systems and deals pretty well with low level issues like raw memory cells.

But it does not matter at all which language the OS is written in. The system call interface is ultimately a very low level matter of interrupts and individual machine architecture, too low level to be addressed in C. This stuff is always written in assembly.

For desktop/server OS's; that is almost always true. For systems lacking memory protection, so that the kernel shares the same address space with the application, e.g. embedded OS's like VxWorks, the interface into the kernel is simply a function call, using the standard ApplicationBinaryInterface? of the underlying processor. In other words, a CeeLanguage function. For what it's worth...

Note that the typical Common Lisp implementations have a very different implementation from GNU Emacs. They tend to be implemented in Lisp as much as possible. GNU Emacs implements a large core (for example a byte code interpreter) and some higher-level stuff in C. GNU Emacs does not use a native code compiler (like SBCL, OpenMCL, LispWorks, ACL, MCL, ...).

In the version of SBCL I have here (which marginally predates the most recent 0.8.9 release), there are 21280 lines of C, and 217407 lines of Lisp - approximately ten times as much. This includes some implementation-dependent code so not all of this is actually compiled on all architectures.

The C code does the following things

Given that the Lisp part of the system includes the native code compiler (yes, all of it, even the code generator) and assembler, there's no reason all this C couldn't be written in Lisp - or some form of non-consing proto-Lisp, in the case of the GC - and we'd expect it to run about as fast, too, but really, why bother? SBCL runs on a fairly large number of Unixlike systems, and we (the maintainers) have better things to do than keeping a list of system call numbers and ActivationRecord layouts for signal handlers for each of them. -- DanBarlow, SBCL hacker, but speaking for myself

One source of confusion is that in many books and courses about Lisp, one is shown how easy to write a Lisp evaluator in Lisp. It takes about a page of code; most undergraduate texts on programming languages will have such a beast somewhere in the text.

It is often implied that this code is an example of a full Lisp interpreter written in Lisp. Only partially true. While the function (which does the same thing as the "eval" special form; and whose discovery made early Lisp interpreters easy to write - before the function was discovered many Lisp programs were hand-compiled into the machine language of the target, a painful process - it isn't a full Lisp implementation.

As mentioned above, it doesn't perform GarbageCollection. It also depends on the existence of other special forms defined by the language - if it sees a (cons x y), it recursively evals x and y and then calls the special form cons to make a ConsNode? as appropriate. Nowhere in such "Lisp evaluator" functions are any reference to the low-level bit-frobbing that any Lisp implementation must do. Nay, the underlying runtime (which ultimately does handle the special forms) takes care of all of these things.

GarbageCollection and the other dirty details handled by a Lisp implementation certainly do not spring forth from the ether. Someone has to write that stuff.

There's also a ton of other stuff, like handling numerics and strings, that soaks up a lot of core implementation effort/LOC, that is rarely mentioned. A true Lisp in Lisp would include that grungy stuff, making it a lot longer than the classic 20 line eval - to no real pedagogical point, of course, but it leads people to underestimate things.

Now it certainly is possible to implement much more of a Lisp runtime in Lisp; "pure" Lisp implementations are known to exist. However, writing a high-performance GarbageCollector (or many other portions of a high-performance Lisp runtime) in Lisp strikes me as a dubious proposition - unless a whole bunch of non-standard features (to allow low-level bit-frobbing to be expressed in Lisp) are included.

The T dialect of Lisp (a Schemish, pre-CommonLisp implementation) had a garbage collector written in itself. It later became the Orbit Scheme compiler, which had very highly regarded performance. -- JonathanTang

It shouldn't be considered shameful to have a Lisp implementation whose core is implemented in C, for example. C is a systems programming language, that's what they are for. Even a C runtime cannot be written in 100% pure C; you will have to delve into assembly code at some point (for things like traps into the operating system, arithmetics operations on some processors, etc.) No big deal. And besides - unless you are running Lisp on a LispMachine - your read loop will include (eventually) calls into the host OS to do I/O, and that host OS is likely implemented in C/C++. (All versions of Unix and Windows certainly are...)

Lisp is a wonderful high-level language. Adding the necessary bit-frobbing to make it a systems programming language would turn it into C++ with EssExpressions and GarbageCollection. (And CommonLisp is already monstrous enough, the last thing it needs is PointerArithmetic and the like tossed in...)

The typical Lisp systems (say, like ACL, CMUCL, MCL, ...) are more than a core plus a library. Parts of the 'core' are typically written in some mixture of C, Assembler and Lisp. Sometimes with the assembler being written in Lisp. Some Lisp implementations are using a low-level Lisp dialect for much of the core. How much there is to a core depends on the Lisp implementation.

CMUCL for example has/d a byte code machine with a byte-code code generator for the compiler. I would consider the byte-code machine to be part of the core. You won't need that byte code machine, if you use the interpreter or you have a native code backend for the platform you are using.

On top of something like a core (basic data structures, basic memory management, basic interface to the OS, basic handling of threads, ...) more complex versions of that can be implemented - typically in Lisp. Like CLOS-based object-oriented I/O (streams), Lisp Machine style thread interface, CLOS-based exception handling (conditions), more sophisticated GCs. Then comes the library of all the data structures of Common Lisp.

But even some stuff on top of the core tends to include some C and assembler. Typically more extensive OS interfaces use C header files. Sometimes assembler is used to interface to special CPU features (like the G4 Altivec instructions).

I would say that some of the serious Lisp implementations are written in the core in a mix of assembler, C and quite a lot Lisp. On top of that much is written in Lisp.

Somebody in another forum pointed out that my description of SBCL as "10% C by volume" although true is not really the whole story: if you've been involved in Lisp implementations you'll know that the additional expressive power of lisp (pick your own estimate from 6x, 10x, 100x depending on what point you're trying to make) means that considerably less then 10% of the actual functionality is in C. I mention that mostly in passing.

If you define the core of a language implementation as "whatever part you have to write in the non-target language before you can actually bootstrap in the target language" then it's trivially true that it will be written in something other than the target language.

I prefer to take the view that the core of a language implementation is the compiler or interpreter that takes expressions written in that language and arranges for them to be run by the target system. Otherwise you're in the rather odd situation of having to claim that the core of gcc is written in assembler: while it's true that you couldn't actually write a program without the equivalent of crt1.S and perhaps ld.so or similar, it's neither the bulk of the work or the part of the system about which all the rest hangs together. Most gcc developers, I feel sure, would regard this as glue code to interface to the environment, not as essential to gcc.

SBCL is written mostly in Common Lisp with less than 10% (by linecount) C or assembler. Most of this C/assembler exists for the purpose of gluing it to the operating system (whose interfaces are usually written to be called from C) and has nothing to do with speed: in fact, you can try recompiling the C portions of SBCL with -O0 and -O3 and watch how it makes approximately no difference whatsoever. I really wouldn't describe it as the core of SBCL any more than I'd describe LILO as the core of Linux. -- DanBarlow

Well, I can't speak for all lisp implementations, of course. However, there is an important reason to not write your Lisp interpreter entirely in Lisp. Why? Because it's a hassle to new users. Everyone has a C compiler. It's ubiquitous, and all the OS tricks are accessible via C. The garbage collector in most Lisp implementations also uses these tricks. One of the major barriers to porting Lisp implementations to Windows is the lack of mmap (which makes the garbage collector schemes that Lisp is best suited for easier to write). If you try to build lisp from the ground up, having the initial core in C means that more people will be able to build it, and then you can provide an image. -- DaveFayram


It should be pointed out that most C compilers use DomainSpecificLanguages like lex or yacc to implement the scanner and parser. While these things can be done in native C code, it's a RoyalPain. (Of course, both of the tools mentioned above produce C code as output).

It should be further pointed out (though I expect this is obvious to many parties) that implementing a compiler (i.e. a program which translates programs from one language to another; in this context, the compilers of interest translate high-level languages to the instruction set of some microprocessor or VirtualMachine) in a HighLevelLanguage poses no technical difficulty - it's just data processing. In fact, higher-level languages like Lisp are far more suited to the task than are things like C; which is why virtually all C compilers use tools like lex and yacc. For Lisp, no need to bother; writing a RecursiveDescent parser to parse EssExpressions is easy to do. (C grammar is sufficiently nasty that a RecursiveDescent parser is not a good choice; better to use a LALR(1) parser instead. Which means yacc or a tool like that; implementing these parser by hand is notoriously difficult).

The only advantages that C has for a compiler implementation language are a) ubiquity, and b) speed. (Which aren't unimportant, by the way).

The hard part is implementing the language's RuntimeSystem?; this is almost always a systems programming chore, and high-level languages have lots of trouble doing it efficiently.


EditText of this page (last edited April 9, 2004) or FindPage with title or text search