Objectives
A radical rethink of optimizing compiler design
Features
It is worth noting that several features also imply properties of language design.
Coded as a caching interpreter: Reversal on the CeeLanguage compilation strategy that provides simplicity and dynamism associated with scripting languages. A build operates on exactly one page (see below). In at least one readily accessible shell-mode or interpreter-mode, one should have access to an REP loop allowing one to execute commands, inspect values, and essentially extend a page. Taking advantage of pre-parsed and precompiled units is accepted (e.g. to reduce latency), but should be hidden behind the scenes (no explicit preparation of build dependencies). Likely Effects:
- The compiler itself operates on FirstClass values in the language, such as functions or records, instead of on text. This liberates the compiler itself from the task locating pages of text, parsing them, and linking everything together... and thus massively simplifies the compiler (and provides some degree of syntax neutrality). There is no need for a fixed representation for the intermediate value form that could lose information (e.g. function bytecode), though there may be a standard text representation for serialization. Focusing compilation at the value granularity supports JustInTime compilation, continuous or speculative optimization and usage profiling, and more flexible and dynamic policies for choosing how much up-front compilation must occur, all of which allow for more dynamism and greater optimizations.
- Cross Compiling: Sometimes cross-compilations will still be necessary, as when performing a compile for embedded systems (or porting the compiler/interpreter system itself if it is written in the same language it interprets). To keep things as simple as possible, this should be performed by use of a function provided from within the interpreter that operates (as above) on values rather than pages. For example, one might be able to cross-compile a function accepting a list of strings into a POSIX console application. Or one might cross-compile a record of functions into a sharable C library. The internal compiler should probably be an instance of a cross-compiler (to reduce duplication), except curried with default security and optimization policies. Whether or not a particular internal function has been compiled, however, should be invisible to the user of the function.
- RE: A build operates on exactly one page: A page is a unit of code that can be named for imports. While RethinkingCompilerDesign, avoiding the filesystem rut might help us think differently... perhaps WikiIde or database or so on. A page could control which words it exports into the API. Or not. That's really up to the language designer, though being one myself I favor controlling exports to resist accidental coupling and provide better documentation. One might call the build target the root page of a build. A build consists of parsing that page and linking/loading page dependencies, possibly performing CompileTimeResolution (e.g. executing load-time commands), and explicitly excludes the compilation task. Compile happens after build, at least semantically.
- pages vs. builds: Given certain language features, especially those intended to resolve CrossCuttingConcerns, the distinction between importing a page and building a page then importing its build-value may be a relevant one. If features described in a page can override or modify components described in imported pages, the distinction will need to be made when it comes time to perform advanced forms of unit-testing within the language. Support for CrossCuttingConcerns? can also have impacts on the cost of the REP loop (e.g. if you modify how logging is performed everywhere), but figuring out how to keep these costs low is a problem for the language designer and implementor.
AlternateHardAndSoftLayers: Instead of a sharp break between the
HighLevelLanguage and assembler, the transition will be smooth. An application program will act as an extension to the language; Functions add to the API; Syntax rules add to the grammar. This might be achievable via some sort of language neutrality with several languages parsing to a common set of core semantics (akin to .Net or JavaVM). Or it might be achieved all in one language. Either way, one can essentially
AlternateHardAndSoftLayers from one programming environment.
- You might envision a shell-style programming environment where, instead of loading a modal application that you then talk to, you instead import a page and suddenly have a bunch of new functions (and possibly syntax) available to you. (Related: DontModeMeIn, NoApplication). Extending this to GUI shells, one would essentially have an ObjectBrowser over a database of values
Optimizations factored from compiler: Optimizations will be externalized. This is analogous to having external libraries that provide <stdio> or <maths>, except with an
AspectOrientedProgramming flavor. It may help to have an official form of 'hot comment' and some ability to manipulate the post-processor in order to select and configure optimizations at a sub-program level and extend the language with new optimizations.
Compiler internals cleanly exposed: This has far reaching consequences. You can get at lists of all symbols. You can call a UniversalIterator? that will walk through any data that has a defined structure, whether the data was defined by you or the compiler, without having to write the iterator yourself. You can add new types on exactly the same footing as internal types. You have precise control over placement of data, if you want it. Any kludgy design within the inner recesses of the compiler is exposed (thereby encouraging the designer to clean it up :-) ). (AdvantagesOfExposingRunTimeEngine)
- Sounds fun, but might be a bad idea. Having debug-access to internals is fine. Coupling one's code to implementation details of any system, including the compiler, however, is bad. Doing so has as a result the compiler becoming rigid and inflexible, unable to update for fear of breaking the code that people have coupled to its exposed structure. The above promise of the compiler being less 'kludgy' would operate on the unlikely assumption that one could get it right the first time. One should restrict coupling to just the compiler's interface and the data one passes to it and receives from it.
- That said, the ability to perform configuration details for cross-compiling, even down to placement of symbols, wouldn't be a bad thing. It just needs to be provided through the interface instead of via exposure to internals.
Abstract Compiler Definition: Much of the compiler would be written in an abstract high-level way as a
GraphReWritingCompiler or
DeclarativeMetaprogramming compiler (related
DeclarativeDeviceDriver,
ConstraintLogicProgramming). It would be also useful to have a data-driven compiler that separates various policies and data (e.g. machine services, costs, optimization policies, resource management policies, and security policies) and somehow combines these for the target machine. However, such features may be better reached through
RefactorMercilessly while favoring a separation of concerns.
Bells and Whistles
Constancy as an attribute: Whether a variable is variable or constant can change at 'run time'. See discussion below under ConstancyAsAnAttribute?.
Stack/calling-format modifiers: Makes coding of co-routines easy (use multiple stacks). Makes non-standard calling conventions easy - e.g. modifier indicates pass data in registers, in floating point coprocessor, or (for booleans) by calling alternative instances of the same code.
Op-Codes first class types: Type system extended beyond that of C so that opcodes can be defined as structured types, using union/struct/bitfield etc. For example, the type system will accommodate self-relative pointers, that is pointers that give an offset from their location rather than an absolute address. One consequence of op-codes becoming first class types is that the development environment's built in system for displaying data also serves as a disassembler.
Cache Specification Layer/Library: Custom caches can be generated from the function specs, to encourage much wider use of cached results.
Machine Architecture using ArchitectureDiscoveryTool?: Rather than the error prone and laborious process of entering machine architectural details by hand, these could be generated by an ArchitectureDiscoveryTool? (see http://portal.acm.org/citation.cfm?id=567100). Particularly valuable if multiple architectures are being targeted. A middle-man database of architecture details would also be a smart idea, as it would allow hand-entry of details that such a tool fails to obtain and would be better applied towards cross-compiles.
MetaDebugInterface: Debugging could be provided on multiple abstraction levels of the transformed code by conforming to a MetaDebugInterface.
Implementation Strategy (Take 1)
So where to start? There needs to be something more regular than a typical processor instruction set or typical high-level language at the conceptual core.
- The original plan was to use the ForthLanguage as a basis, and a very primitive implementation of FORTH at that. The simplicity and regularity of FORTH makes introspection by the program, where it examines its own workings, more straightforward. Introspection is an essential. LISP, a simple 'virtual machine' and TeX(!) have also been suggested as starting points.
- A higher-level approach is to regard AbstractSyntaxTrees as the conceptual core. Choose a concrete representation for these trees, and you have a starting point for coding in 'C' or your language of choice.
- An even higher-level approach is to wait until you have complete values prior to compilation. ASTs and such are stuff of parsers. Go ahead and get all the parsing and dependencies and denotational semantics handled before compiling, and support compiling of new FirstClass functions at runtime. E.g. one can compile a function accepting a list of strings into some equivalent of 'main'. One can compile a record of functions into a DLL. Or one can keep it internal and JIT whatever needs compiled at the time without reversing to the AST. This solution results in the smoothest transition from interpreter to compiler. This would suggest starting with any interpreted language as a basis, implementing a 'cross-compiler' in that language, then integrating some particular instance of the cross-compiler as a JIT. The trick is choosing a language you want to use... IdealProgrammingLanguage is not a problem solved by compiler design, and there are many language design decisions that can make writing an optimizing compiler easier or more difficult.
The
AbstractSyntaxTree approach (proposed by
GunnarZarncke) shows that the starting point is only a concrete initial representation of a more abstract approach. A decision to use a particular data representation in the initial implementation does not lock you in to that long term. The compiler/interpreter's functions will in the course of development gain sufficient power that you can introspect the internal workings of the compiler and change, for example, the way that trees are represented internally
without a radical re-write.
The suggested implementation route is to implement an interpreter and add optimizations to it that make it behave like a compiler. The observation is that a constant-folding interpreter can cache its results and make them available for later. Rather than writing both an interpreter and a compiler we instead get the interpreter to interpret an instance of itself interpreting the code being compiled. To the top level interpreter, the second level interpreter and the code being compiled are all constants. Constant folding optimizations optimize away the calculations of the second level interpreter, optimizing away the interpretation overheads and leaving just the useful instructions, or that's the theory anyway.
Con: A considerable body of code is needed before a useful system emerges.
This sounds like the SecondFutamuraProjection?, which is sort of explored in PyPy.
It is. But in PyPy you cannot go down to the metal without an external tool (like in Tak 2 below). So you have neither of the advantages. The idea is to have a primitive DynamicCodeGeneration? facility, call it StructuredSelfModifyingCode?. It's to simple self-modifying code as loops are to goto.
Implementation Strategy (Take 2)
An alternative implementation strategy is to initially forget about native object code generation! Instead focus on the introspection and the high-level side of optimizations. In this guise the games compiler becomes a code generator. Let it parse C to its internal representation, process that with customizable re-write rules, and then generate C once again rather than object code. Much of the work on re-write rules and on universal iterators can be done and debugged, and the ideas debugged, without going near object code.
Some uses for the code generator:
- Auto-generate efficient C code for working with different formats of bitmaps, from a high-level description.
- Add profiling code.
- Add serialization code.
- Provide data-relocation code.
- Provide data packing/unpacking/endian-conversion code.
This kind of thing is sufficiently useful in itself that the new 'compiler' code could start being used for non-toy projects.
Con:
- Full control over the lowest parts of the compiler (i.e. CodeGeneration) will not initially be possible with this approach. Therefore the necessary high-performance code might never be generated.
- To actually compile anything an external program (here: a CeeCompiler?) is required.
Pro:
- Although the BreakEven requires more code than the first strategy, it may be quicker and easier to reach that BreakEven, because of the maturity of the existing C development tools (better debugger and IDE). But see MetaDebugInterface.
The page TheSimplestPossibleCompiler is discussing an implementation strategy that is close in spirit to 'implementation strategy 1'. The general consensus seems to be that a lot of up-front design work, or at the very least extensive domain knowledge, is needed if the initial take is to have a hope of evolving smoothly into an industrial strength (gcc or better) optimizing compiler. The strongest argument backing up that opinion is around differences in datastructures, but to this reader's eyes amounts to no more than moving from one specific structure to another which has the first one as a special case.
It is correct, that simple AST transformations alone will not suffice to do e.g. interprocedural ConstantFolding? and multi-variate specialization. Further data-structures/algorithms will be needed to support the basic GraphRewritingCompiler?:
- StaticSingleAssignmentForm and ControlFlowGraphs to capture control and data dependencies between expressions.
- a LoopNestingForrest?, that can handle irreducible ControlFlowGraphs? to perform loop fusion and other loop optimizations efficiently. This is often used to guide source-to-source transformations in C and Fortran code followed by compilation with code-motion optimizations deactivated. Therefor it use results in AST-transformations for our approach.
- an efficient algorithm for RegisterAllocation (the actual code generation can be done by term-rewriting as approaches like BEG show - see http://www.hei.biz/beg/).
These (or like ones) are needed for an industrial strength compiler no doubt. In the first step one would use simplified algorithms and data-structures and keep the information in attributes attached to the AST nodes. In the long run one
might consider a hybrid structure, where the AST nodes are updated automatically with dependency information etc. --
GunnarZarncke
Related On Wiki
EditHint: Perhaps this section can all move to CategoryCompilers? Or just be deleted?
Look for contribs on compilers and optimization by:
Or go straight to
GrokTheCompiler describes some of the issues facing games developers.
For definitions and some off-site links, try:
Other sources
Possibilities for borrowing and stealing ideas from other systems?
Q: Why CategoryGameProgramming, and why the original title "GamesCompiler" [now renamed]??? It looks more like MetaCompilerLanguage? or SelfCompilerLanguage? or even CompilerOptimizedLanguage?. -- GunnarZarncke
A: You are correct. However there seems to be more active interest in control over optimization amongst game developers than elsewhere, and more understanding of what is required there than elsewhere. My own interest comes from DNA sequence analysis, where some analysis programs can run for months. -- JamesCrook
How to break the project in to small enough stages:
There are too many tasks that cannot easily be broken down into term rewriting tasks. E.g. the low-level parts of code-generation like register-assignment. Many of these tasks can be fitted into fixpoint iteration over equations on vectors over program points of lattice elements representing knowledge over the points (i.e. the 'attributes' calculated above). But I'm not yet clear, how these equations can interact cleanly with the term rewrite rules. My idea was to use a work-list approach, just evaluate what it knows at each point and postpone everything else - inefficient initially, but ripe for higher-level optimization later. Converting the well-knowns equations for typical problems like constant propagation into the framework should be straight forward, but other points of the integration like memory-management and OS-calls are still unclear (as of Feb 2005).
The basic goal is to reach the BreakEven (see there) as early as possible. Would like to ensure that even if a large body of code is required to get a reasonable compiler, that code will be in the target language nearly from the beginning - though that's not an absolute requirement - converting later/semi-automatically is a workable option.
See BootStrap, SuperCompiler, TheSimplestPossibleCompiler
Discussion
This discussion section may stay quite short, as I'd like to, or for you to, refactor discussion into DocumentMode rapidly. -- JamesCrook
We want something more regular than a typical processor instruction set or typical high-level language, as the conceptual core. Bootstrapping is a problem with this objective, because efficient compiled code is a strong requirement.
The following ingredients are needed:
- A simple and regular parser for expressions with a rich set of operators and precedences (MinimalParsing is easy).
- A TermRewriting or graph-rewriting system, that can synthesize rule-valued attributes (to allow reflection and model types) and can delay rewriting (lazy execution or worklist).
- An initial set of primitive rewrite rules for creating rules and for executing expressions as machine code.
- Rules/Expressions describing formation of machine code of the target machine.
These can be implemented in plain C code. I already have a parser and parts of the rewriting engine as well as an idea what the rewrite rules should look like. This simple approach cannot guarantee convergence or confluence of the rewriting (reflection is obviously '?' too strong for that), but it should be enough to bootstrap a higher-level version of the engine (e.g. one with backtracking capabilities and explicit state).
ConstancyAsAnAttribute?:
Q: Why is this important? -- NickShaffner?
A: Because of constant folding. If parameters or a 3D model is read in from an external file, and then remains unchanged thereafter, certain 'constant folding' optimizations at run time become possible.
I think that this way may lead to consistency problems. It is not 'well-defined', in the sense that it cannot be checked or enforced. I'd rather prefer to do PartialEvaluation (or in this context: lazy compilation) on the expression using the 'constancy' in question. This is the way used in e.g. TickCee and SynthesisOs. And it would be quite easy to do with a compiler, that is already able to compile itself. -- GunnarZarncke
A cleaner solution, semantically, is to distinguish immutable values from mutable state at a language level. One loads the 3D model as a value which is 'constant' immediately. Values tend to be big (e.g. full collections, records, tuples, and trees) and variables contain a reference to the value. This approach is very good for many optimizations as well as various orthogonal properties (transparent distribution, transparent persistence, SoftwareTransactionalMemory, versioning and FirstClassUndo, etc.). For PartialEvaluation, it also helps if functions are distinguished from procedures (i.e. reject 'impure' functional programming; favor interleaved procedural/pure-functional instead. See notes in IdealProgrammingLanguage).
My notation for this (in the context of MinimalParsing) has been to use two prefix operators ('\expr' and '#expr') to specify late compilation. '\expr' delays compilation of 'expr' until run-time (run-time of the containing expression, which may very well be during compilation). '#expr' specifies evaluation of 'expr' during compilation of the containing expression.
Example: 'for (i=1; i<10000; i++)\{a += #sin(b)}' This will replace 'sin(b)' with the actual value during the loop. Of course, this reduction of strength might be automated by a state-of-the-art compiler, but this explicit way has some advantages:
- Can be used even if 'sin' has side effects.
- Can be incorporated as primitives in our GraphReWritingCompiler and thus be used to implement StrengthReduction.
- It is a cleaner mechanism than '#include', '#define' etc., but can be used in the same way: '#(a=1)' defines a constant, '#System.print("hallo world")' prints during compilation.
--
GunnarZarncke
I wrote a related page called CompileTimeResolution. I'll note that I disfavor your exact approach for a number of reasons.
- First, the ability in a language to semantically delay the compilation of an expression is very harmful to cross-compilers and embedded systems programming (especially reuse of library code). This is because the program will essentially need to have a compiler installed to work with off-the-shelf libraries. Somehow controlling order of evaluation in an expression would be a better solution... for example, in the above example you could have the #sin(b) essentially mean replace this with the value of executing sin(b) the moment 'sin' and 'b' are fully resolved. This would avoid the need for delayed compilation. One could still have it be lazily executed (#\sin(b)).
- Second, the need to delay compilation of a larger expression in order to express the desire to compile part of it earlier is likely counter-productive.
- Third, and importantly, I feel that a CompileTimeResolution technique needs to be conservative, such that it doesn't occur for dead code. One solution to this is similar to how Makefiles operate: each resolution returns a value, and the CompileTimeResolution occurs only if that value is deemed to be necessary. The conservative solution has better characteristics for code reuse and modularity because dead-code that contains a resolution will simply not execute it, and thus programmers can reuse code while automatically avoiding unnecessary side-effects.
Q: Would the resulting language be one of the HomoiconicLanguages?
A: Not necessarily. If compilation occurs at the AST level, then it is likely to be Homoiconic. If it occurs after semantics are applied, then it needn't be Homoiconic but it does require FirstClass decomposable functions and types and such. If occurs prior to the AST, then it needn't be homoiconic either. That said, since we LanguageDesigners on WikiWiki tend to like SymmetryOfLanguage, being homoiconic is probable. It simply isn't necessary.
Comment:homoiconicity is a consilience between a general data structure and the syntax of a language. A tree is a general phrase data structure, and also capable of storing an AST specifically. People are used to the idea that homoiconic languages are syntacticalyally rather simple, but that need not be the case.
Admirable. How do you plan to implement non-local optimizations - lets say the old boring register allocation? If the intermediate representation is written in a continuum of 'languages', how can an optimization understand enough context to be useful? -- cristipp
My plan is, that this is not done directly on the symbolic level (even though it might be possible). The idea is to provide a directed flow equation framework, that supports fixpoint calculation in the presence of incremental updates. This framework is connected to the nodes of the symbolic term graph in the following way:
- flow equations are specified for nodes, that match rewrite rules (see GraphReWritingCompiler) like this: `reg(^a)`->`reg(^a, ^solve(live(a))` where "solve" would be the primitive function for the incremental solver and "live" be the property to solve (defined elsewhere).
- when the term is rewritten, instead of replacing terms directly (after function application), the special solve function is used to update the corresponding flow equations incrementally and return the result when it is stable (i.e. delaying if needed until other nodes provide sufficient data for the solver).
--
GunnarZarncke
See also StepsTowardTheReinventionOfProgramming
CategoryCompilers CategoryOptimization CategoryGameProgramming