The simplest optimizing compilers I've seen are in PeterNorvig's ParadigmsOfArtificialIntelligenceProgramming and ChristianQueinnec's LispInSmallPieces. They share several traits:
Furthermore, I suspect that any of these compilers could be built using XP and a bit of domain knowledge. Even better, you could improve them further without radical re-design--techniques are known for doing native code generation straight from an AbstractSyntaxTree.
But what about a modern production compiler?
Modern production compilers generally don't rely on ASTs. Typically, they use something like (say) N-address code, possibly in StaticSingleAssignmentForm. They use lots of funky algorithms. For an idea of how hairy it can get, compare the algorithms in StevenMuchnick?'s AdvancedCompilerDesignAndImplementation, or even TheDragonBook (older), to the books above.
I don't think you could build one of these beasts without some prior design work. I don't think you could start out with one of the simple compilers above, and transform it into one of these without breaking a lot of test suites for several programmer-weeks. I could be wrong.
Or does XP classify this kind of broad-sweeping, hard-won, architectural knowledge as a SystemMetaphor, as was suggested by RalphJohnson? -- EricKidd?
AssemblyLanguage fits here in 2 different ways:
To get really fast applications, the compiler will, of course, have to compile all the way to MachineCode. (To make it easier to test if it's working right, many compilers have a compiler option to save a ".cod" (MachineCode) file, which shows the raw hex MachineCode bytes, the AssemblyLanguage, and the high-level source.).
If I get to pick the source language, the simplest possible compiler for me to write is an assembler for AssemblyLanguage. If you get to pick the source language and don't have to optimize, the simplest possible compiler is the null compiler that performs an identity transformation. Let the source language be the target runtime format. Done.
The simplest optimizing compiler I've written was 200 lines of C code, compiling ForthLanguage to powerpc MachineCode. Optimizations were limited, but included TailCallOptimization, ConstantFolding?, inlining, and a generic PeepholeOptimization? mechanism that used something akin to sed's s/foo/bar/ to rewrite short segments of code. Oh, and it was hard RealTime in the absense of ForthMacros (although, in Forth, macros are rarely absent); the amount of work for a character in the source code was bounded. -- AdamBerger
ForthSimplicity has the complete source code for a (non-optimizing) compiler.
ColorForth also has a very small compiler, with limited optimizations. ForthLanguage lends itself to simple compilers. It is more of a macro assembler for a virtual machine than a high-level language.
I remember that the compiler (to 68k) for the FalseLanguage had only 1K size (written in AssemblyLanguage). -- GunnarZarncke
The C compiler I wrote for the BbcMicro while at Acornsoft ran in 20K - code and data. It barely deserved the title "optimizing", though. Paul Fellows also wrote a Pascal compiler for the same machine that also ran in 20K. -- PaulHudson