Minimal Parsing

The desirable property of a ProgrammingLanguage, because it

ProgrammingLanguages with this property:


Compilers like gcc have huge grammars for the C/C++ language. Many optimizations, type compatibility checks and operator precedence rules are coded in the grammar.

An alternative is to use a simple under-specified grammar. Such a grammar allows input that is not legal C/C++ - but does produce a parse tree. Validity checks are postponed, and in effect become a form of type checking.

Type checking is the evaluation of a type comparison expression. As types are (typically) constant for the duration of execution of a program, these are evaluations of constants expressions and so can be made at compile time.

Minimal Parsing is ONLY determining an initial disambiguated tree structure from the textual source input. Beyond that point, everything else is a re-write rule.

Such a system can make syntax error reporting and recovery simpler. It also can make introspection of the parser simpler.


Or perhaps this refers to languages that only require MinimalParsing, such as ForthLanguage.


The parser I have in mind does MinimalParsing (see RethinkingCompilerDesign) -- at least that was its intention. It's a RecursiveDescent parser. The C code fits in less than 200 lines plus some tables for the description of the operators (binary, prefix and suffix unary plus the special suffix form <expr><op><expr><op> for constructions like func(x) or array[0] and for comments, which are considered code too). The syntax supported by this default-parser (I considered plugging in special parsers for other languages with a special syntax like parse<delimiter-token><customer syntax><delimiter-token>) is already quite powerful and allows constructions like "int:max(int:x,int:y) <- if(x>y) x else y";

A MinimalParser? can also be written that is table-driven. One implementation supports three grammar rule types:

This approach is handy, but note, cannot handle grammars as complex as e.g. those that YACC accepts without the same massive grammar analysis (see DragonBook?), because otherwise for many grammars there will be many conflicting rules -- when to enter states, when to leave states, etc. But of course, if you're willing to change the language (not just the grammar), that's not a problem.

A grammar for most of C can be defined in 60 rules.

Pretty small; sounds interesting, let's see it.


The opposite of MinimalParsing is also possible, which by way of contrast could be called maximal parsing: doing as much checking (especially type checking) as possible in the grammar. This can be taken pretty far, but tends to make the grammar explode in size. For instance, each arithmetic type (short, int, float, double) can be given its own set of grammar rules. For simple languages such as 4 function calculators supporting just int and float, this can be done in a reasonably straightforward way in systems like YACC.

For more complicated languages, it strains YACC to the breaking point to try to handle everything, including user-defined types, in that sort of way, but it's largely possible, with small extensions to YACC (or a more powerful parser generator) and use of some rather ugly tricks.

It's usually not really a very good idea, though.


Note that "simplifying the parsing tools" is sometimes achieved by using existing tools or grammars. For example, many languages use the C-derived style these days (Java, JavaScript, C#, PHP). This way they are not starting from scratch and most existing tools either recognize the syntax or are easier to adapt.


CategoryCompilers


EditText of this page (last edited July 31, 2010) or FindPage with title or text search