Rethinking Compiler Design


A radical rethink of optimizing compiler design


It is worth noting that several features also imply properties of language design.

Coded as a caching interpreter: Reversal on the CeeLanguage compilation strategy that provides simplicity and dynamism associated with scripting languages. A build operates on exactly one page (see below). In at least one readily accessible shell-mode or interpreter-mode, one should have access to an REP loop allowing one to execute commands, inspect values, and essentially extend a page. Taking advantage of pre-parsed and precompiled units is accepted (e.g. to reduce latency), but should be hidden behind the scenes (no explicit preparation of build dependencies). Likely Effects:

AlternateHardAndSoftLayers: Instead of a sharp break between the HighLevelLanguage and assembler, the transition will be smooth. An application program will act as an extension to the language; Functions add to the API; Syntax rules add to the grammar. This might be achievable via some sort of language neutrality with several languages parsing to a common set of core semantics (akin to .Net or JavaVM). Or it might be achieved all in one language. Either way, one can essentially AlternateHardAndSoftLayers from one programming environment. Optimizations factored from compiler: Optimizations will be externalized. This is analogous to having external libraries that provide <stdio> or <maths>, except with an AspectOrientedProgramming flavor. It may help to have an official form of 'hot comment' and some ability to manipulate the post-processor in order to select and configure optimizations at a sub-program level and extend the language with new optimizations.

Compiler internals cleanly exposed: This has far reaching consequences. You can get at lists of all symbols. You can call a UniversalIterator? that will walk through any data that has a defined structure, whether the data was defined by you or the compiler, without having to write the iterator yourself. You can add new types on exactly the same footing as internal types. You have precise control over placement of data, if you want it. Any kludgy design within the inner recesses of the compiler is exposed (thereby encouraging the designer to clean it up :-) ). (AdvantagesOfExposingRunTimeEngine)

Abstract Compiler Definition: Much of the compiler would be written in an abstract high-level way as a GraphReWritingCompiler or DeclarativeMetaprogramming compiler (related DeclarativeDeviceDriver, ConstraintLogicProgramming). It would be also useful to have a data-driven compiler that separates various policies and data (e.g. machine services, costs, optimization policies, resource management policies, and security policies) and somehow combines these for the target machine. However, such features may be better reached through RefactorMercilessly while favoring a separation of concerns.

Bells and Whistles

Constancy as an attribute: Whether a variable is variable or constant can change at 'run time'. See discussion below under ConstancyAsAnAttribute?.

Stack/calling-format modifiers: Makes coding of co-routines easy (use multiple stacks). Makes non-standard calling conventions easy - e.g. modifier indicates pass data in registers, in floating point coprocessor, or (for booleans) by calling alternative instances of the same code.

Op-Codes first class types: Type system extended beyond that of C so that opcodes can be defined as structured types, using union/struct/bitfield etc. For example, the type system will accommodate self-relative pointers, that is pointers that give an offset from their location rather than an absolute address. One consequence of op-codes becoming first class types is that the development environment's built in system for displaying data also serves as a disassembler.

Cache Specification Layer/Library: Custom caches can be generated from the function specs, to encourage much wider use of cached results.

Machine Architecture using ArchitectureDiscoveryTool?: Rather than the error prone and laborious process of entering machine architectural details by hand, these could be generated by an ArchitectureDiscoveryTool? (see Particularly valuable if multiple architectures are being targeted. A middle-man database of architecture details would also be a smart idea, as it would allow hand-entry of details that such a tool fails to obtain and would be better applied towards cross-compiles.

MetaDebugInterface: Debugging could be provided on multiple abstraction levels of the transformed code by conforming to a MetaDebugInterface.

Implementation Strategy (Take 1)

So where to start? There needs to be something more regular than a typical processor instruction set or typical high-level language at the conceptual core.

The AbstractSyntaxTree approach (proposed by GunnarZarncke) shows that the starting point is only a concrete initial representation of a more abstract approach. A decision to use a particular data representation in the initial implementation does not lock you in to that long term. The compiler/interpreter's functions will in the course of development gain sufficient power that you can introspect the internal workings of the compiler and change, for example, the way that trees are represented internally without a radical re-write.

The suggested implementation route is to implement an interpreter and add optimizations to it that make it behave like a compiler. The observation is that a constant-folding interpreter can cache its results and make them available for later. Rather than writing both an interpreter and a compiler we instead get the interpreter to interpret an instance of itself interpreting the code being compiled. To the top level interpreter, the second level interpreter and the code being compiled are all constants. Constant folding optimizations optimize away the calculations of the second level interpreter, optimizing away the interpretation overheads and leaving just the useful instructions, or that's the theory anyway.

Con: A considerable body of code is needed before a useful system emerges.

This sounds like the SecondFutamuraProjection?, which is sort of explored in PyPy.

It is. But in PyPy you cannot go down to the metal without an external tool (like in Tak 2 below). So you have neither of the advantages. The idea is to have a primitive DynamicCodeGeneration? facility, call it StructuredSelfModifyingCode?. It's to simple self-modifying code as loops are to goto.

Implementation Strategy (Take 2)

An alternative implementation strategy is to initially forget about native object code generation! Instead focus on the introspection and the high-level side of optimizations. In this guise the games compiler becomes a code generator. Let it parse C to its internal representation, process that with customizable re-write rules, and then generate C once again rather than object code. Much of the work on re-write rules and on universal iterators can be done and debugged, and the ideas debugged, without going near object code.

Some uses for the code generator:

This kind of thing is sufficiently useful in itself that the new 'compiler' code could start being used for non-toy projects.



The page TheSimplestPossibleCompiler is discussing an implementation strategy that is close in spirit to 'implementation strategy 1'. The general consensus seems to be that a lot of up-front design work, or at the very least extensive domain knowledge, is needed if the initial take is to have a hope of evolving smoothly into an industrial strength (gcc or better) optimizing compiler. The strongest argument backing up that opinion is around differences in datastructures, but to this reader's eyes amounts to no more than moving from one specific structure to another which has the first one as a special case.

It is correct, that simple AST transformations alone will not suffice to do e.g. interprocedural ConstantFolding? and multi-variate specialization. Further data-structures/algorithms will be needed to support the basic GraphRewritingCompiler?:

These (or like ones) are needed for an industrial strength compiler no doubt. In the first step one would use simplified algorithms and data-structures and keep the information in attributes attached to the AST nodes. In the long run one might consider a hybrid structure, where the AST nodes are updated automatically with dependency information etc. -- GunnarZarncke

Related On Wiki

EditHint: Perhaps this section can all move to CategoryCompilers? Or just be deleted?

Look for contribs on compilers and optimization by:

Or go straight to GrokTheCompiler describes some of the issues facing games developers. For definitions and some off-site links, try:

Other sources

Possibilities for borrowing and stealing ideas from other systems?

Q: Why CategoryGameProgramming, and why the original title "GamesCompiler" [now renamed]??? It looks more like MetaCompilerLanguage? or SelfCompilerLanguage? or even CompilerOptimizedLanguage?. -- GunnarZarncke

A: You are correct. However there seems to be more active interest in control over optimization amongst game developers than elsewhere, and more understanding of what is required there than elsewhere. My own interest comes from DNA sequence analysis, where some analysis programs can run for months. -- JamesCrook

How to break the project in to small enough stages:

There are too many tasks that cannot easily be broken down into term rewriting tasks. E.g. the low-level parts of code-generation like register-assignment. Many of these tasks can be fitted into fixpoint iteration over equations on vectors over program points of lattice elements representing knowledge over the points (i.e. the 'attributes' calculated above). But I'm not yet clear, how these equations can interact cleanly with the term rewrite rules. My idea was to use a work-list approach, just evaluate what it knows at each point and postpone everything else - inefficient initially, but ripe for higher-level optimization later. Converting the well-knowns equations for typical problems like constant propagation into the framework should be straight forward, but other points of the integration like memory-management and OS-calls are still unclear (as of Feb 2005).

The basic goal is to reach the BreakEven (see there) as early as possible. Would like to ensure that even if a large body of code is required to get a reasonable compiler, that code will be in the target language nearly from the beginning - though that's not an absolute requirement - converting later/semi-automatically is a workable option.

See BootStrap, SuperCompiler, TheSimplestPossibleCompiler


This discussion section may stay quite short, as I'd like to, or for you to, refactor discussion into DocumentMode rapidly. -- JamesCrook

We want something more regular than a typical processor instruction set or typical high-level language, as the conceptual core. Bootstrapping is a problem with this objective, because efficient compiled code is a strong requirement.

The following ingredients are needed:

These can be implemented in plain C code. I already have a parser and parts of the rewriting engine as well as an idea what the rewrite rules should look like. This simple approach cannot guarantee convergence or confluence of the rewriting (reflection is obviously '?' too strong for that), but it should be enough to bootstrap a higher-level version of the engine (e.g. one with backtracking capabilities and explicit state).


Q: Why is this important? -- NickShaffner?

A: Because of constant folding. If parameters or a 3D model is read in from an external file, and then remains unchanged thereafter, certain 'constant folding' optimizations at run time become possible.

I think that this way may lead to consistency problems. It is not 'well-defined', in the sense that it cannot be checked or enforced. I'd rather prefer to do PartialEvaluation (or in this context: lazy compilation) on the expression using the 'constancy' in question. This is the way used in e.g. TickCee and SynthesisOs. And it would be quite easy to do with a compiler, that is already able to compile itself. -- GunnarZarncke

A cleaner solution, semantically, is to distinguish immutable values from mutable state at a language level. One loads the 3D model as a value which is 'constant' immediately. Values tend to be big (e.g. full collections, records, tuples, and trees) and variables contain a reference to the value. This approach is very good for many optimizations as well as various orthogonal properties (transparent distribution, transparent persistence, SoftwareTransactionalMemory, versioning and FirstClassUndo, etc.). For PartialEvaluation, it also helps if functions are distinguished from procedures (i.e. reject 'impure' functional programming; favor interleaved procedural/pure-functional instead. See notes in IdealProgrammingLanguage).

My notation for this (in the context of MinimalParsing) has been to use two prefix operators ('\expr' and '#expr') to specify late compilation. '\expr' delays compilation of 'expr' until run-time (run-time of the containing expression, which may very well be during compilation). '#expr' specifies evaluation of 'expr' during compilation of the containing expression.

Example: 'for (i=1; i<10000; i++)\{a += #sin(b)}' This will replace 'sin(b)' with the actual value during the loop. Of course, this reduction of strength might be automated by a state-of-the-art compiler, but this explicit way has some advantages:

-- GunnarZarncke

I wrote a related page called CompileTimeResolution. I'll note that I disfavor your exact approach for a number of reasons.

Q: Would the resulting language be one of the HomoiconicLanguages?

A: Not necessarily. If compilation occurs at the AST level, then it is likely to be Homoiconic. If it occurs after semantics are applied, then it needn't be Homoiconic but it does require FirstClass decomposable functions and types and such. If occurs prior to the AST, then it needn't be homoiconic either. That said, since we LanguageDesigners on WikiWiki tend to like SymmetryOfLanguage, being homoiconic is probable. It simply isn't necessary.

Comment:homoiconicity is a consilience between a general data structure and the syntax of a language. A tree is a general phrase data structure, and also capable of storing an AST specifically. People are used to the idea that homoiconic languages are syntacticalyally rather simple, but that need not be the case.

Admirable. How do you plan to implement non-local optimizations - lets say the old boring register allocation? If the intermediate representation is written in a continuum of 'languages', how can an optimization understand enough context to be useful? -- cristipp

My plan is, that this is not done directly on the symbolic level (even though it might be possible). The idea is to provide a directed flow equation framework, that supports fixpoint calculation in the presence of incremental updates. This framework is connected to the nodes of the symbolic term graph in the following way:

-- GunnarZarncke

See also StepsTowardTheReinventionOfProgramming

CategoryCompilers CategoryOptimization CategoryGameProgramming

EditText of this page (last edited June 14, 2014) or FindPage with title or text search