Polyanna Language

Polyanna was a fictional domain-specific minilanguage that did two things special:

It could map sql posts to objects lazily and interface them to the language.
It provided a easy but powerful parser generator (a little yacc-like).

So it was a minilanguage that you used to write even smaller languages. A wiki, for example, could have its markup implemented as a polyanna-language, or you could define the output of a particular cgi/webform (polyanna had good support for cgi) as a polyanna-language, or use it to parse templates.

These files were parsed by polyanna itself, with the aid of the grammar file.

You could also use the parser facility to extend polyanna itself, to provide new syntax and new operators.

Polyanna existed in a dream that SunnanFenderson had while sleeping a soft March night. Polyanna was invented by a strange girl with black hair who called herself Catherine. In the dream the language was the next nirvana. In reality I don't really see the point. Anyone here think otherwise? Should it be implemented?

The 1960 movie Pollyanna (note the two L's) is playing today, and therefore perhaps did recently as well, on the Hallmark channel. Perhaps you fell asleep with the TV on?

While I don't have a television set, it's possible that, since the movie was playing recently, thoughts of the name had been floating around in the CollectiveUnconscious?.

I had a dream I ate a giant marshmallow. When I woke up, my pillow was gone.

PrologLanguage already has such features (DCG Parser Generator, operator definition). The grammar can be defined in the program itself or consulted from an external file. See for example http://www.cs.sfu.ca/~cameron/Teaching/383/DCG.html

Well, yeah, you can do it with Prolog, or Bison/Flex/C. You can also do something similar with ForthLanguage or Lisp. In the dream, polyanna was used by "the masses", though. Whatever that means. And I thought the name was beautiful.

Just as the PerlLanguage was made for report generation even though the features already existed in awk/sed, polyanna was a language made to do languages/translation, and easily so. Don't know how the database stuff comes into the picture; I'm much more interested in the parsing part.

Instead of function calls you had (possibly recursive/nested) syntax definition and/or substitution.

Thank you very much for the link, by the way. I'm reading it a few times over.

There can never be a language literally for the masses, because the average person actively dislikes nit-picking attention to detail.

But there are things you could explore to make a language along these lines more accessible, anyway, although still only to a minority. Programming by example would help.

A grammar generator is probably impossible to make accessible except by driving it by example, for instance, and even then to limit it to a smaller class of languages than the typical LALR(1).

Why limit it? Limit in what way? Typical LALR(1)-capability is about the level I'm reaching for.

Limit it because writing an LALR(1) BNF grammar is difficult. For a non-trivial grammar, programmers who aren't specialists usually have quite a bit of trouble writing a conflict-free grammar, and every parser generator I've ever seen makes it very difficult to debug the grammar.

So to make it more accessible seems to me to imply choosing a subset of context free grammars that is smaller than LALR(1) to help sidestep those difficulties.

Try a GLR parser like Elkhound. You debug in terms of ambiguities, not shift-reduce conflicts (which are resolved nondeterministically). It can also handle ambiguous grammars; both parse trees are handed back to the application.

http://www.cs.berkeley.edu/~smcpeak/elkhound/

Shift-reduce and reduce-reduce conflicts ARE ambiguities, but perhaps I need to look at Elkhound to correctly interpret your intention there.

Ambiguous parsing results are describable in an user-friendly way and are immediately understandable by whoever knows the language.

That seems dubious in general; what's an example of what you mean?

The Elkhound approach is better, because shift-reduce conflicts are a technicality of bottom-up parsing and don't always relate directly to the language, for instance when they arise from lack of sufficient lookahead, e.g., if the grammar is LALR(5) but the parser generator only supports LALR(1).

But sometimes the conflict signals a true ambiguity that isn't just a technicality of shift-reduce parsers.

Consider if/then/else, found in literally hundreds or even thousands of languages. A construct like "if e1 then if e2 then s1 else s2". Should the "else" bind to the first "if" or the second? It's a question of what the language design intends; there is no absolute right answer. And because of that language ambiguity, shift-reduce parser generators will tell you there's a shift-reduce conflict there. That absolutely is not a mere artifact of such parsers, it is a signal of true ambiguity in the grammar. There is no technique that can "fix" this, other than redesigning such languages to avoid the ambiguity, or to use a disambiguating mechanism to specify which alternative you prefer for the parser to implement.

However, typically full-fledged GLR systems do not hand back "both" parse trees; with an ambiguous grammar they hand back the arbitrarily large number of parse trees, depending on the details of the grammar and the input.

Getting handed a double-handful of possible parses IS NOT an approach that is friendly to the masses; on the contrary, the average non-specialist user of a parser generator wants it to "just work", and did not actually intend any ambiguity in the first place, but wasn't skilled enough to eliminate ambiguity from the grammar, nor even to quite understand why it was there in the first place.

Yacc and its clones also do not produce user-friendly output to assist in grammar debugging; in fact it is positively user-hostile. I've heard rumors that there may be more friendly systems out there somewhere, which at minimum might produce diagnostics that refer exclusively to productions specified by the user rather than to Yacc-synthesized states.

Maybe you mean that Elkhound is at least friendly in that sense.

-- DougMerritt

From what I've read of the Elkhound papers, a primary goal was to provide easier debugging of grammars. At least Elkhound will frame the error in terms of ambiguities (instead of shift/reduce conflicts, which are meaningless unless you understand shift/reduce parsers), and hand you back the full parse forest. You can then incrementally debug the grammar, possibly using post-parsing disambiguation at first until the grammar conforms to LALR(1).

Some ambiguities are probably better handled after parsing anyways. Elkhound was originally designed to support Elsa, a C++ parser. In C and C++, a major ambiguity comes from the typedefs. Instead of feeding back symbol table info to the lexer, Elkhound can hand both parses onto the typechecker, which then has the full symbol table available to decide whether a given symbol is a type or a variable.

-- JonathanTang

I think a PolyannaLanguage would be interesting for learning language structure. A language that could interpret grammars. Existing tools like Bison generate secondary code that then needs to be compiled. The prolog DCG grammar is a lot easier for playing with toy languages because it does interpret the grammar "on the fly". But then you have to deal with a lot of other concepts like unification, backtracking, cut and so forth to make a working parser. A Polyanna interpreter could be a simple command line program (itself written in C/C++ or java) with usage like:

polyanna grammar.g program.ext

Grammar.g would have bnf rules describing the syntax of the target language and semantic functions similar to DenotationalSemantics (in a simple meta language say like JavaScript) and program.ext would be code in the target language. It could come with example .g files as a starting point with subsets of more familiar languages say expression.g, scheme.g, ceeplusplus.g (prolog.g ?). Even if users have "quite a bit of trouble" making new conflict-free grammars, that would be the point: experiment, make mistakes, and thereby learn. Having examples to modify and a man page would make it easier. If you wrote a grammar for polyanna itself and evoked it thus

polyanna polyanna.g test.pol

Would that be an instance of QuineProgram ?

NB: I also imagine files that contain the grammar like so:

  cat grammar.g program.ext > program.pol

would also work, just like your first example above.

Which primitives does polyanna need to have? (You write "simple meta language like JavaScript" but I imagined it being done with polyanna's primitives (plus non-primitives derived from other grammar files).

I'm thinking JavaScript for a prototype, because it has an eval() function, is available on everyone's computer, is similar enough to C/Java that it should be easy to port. The JavaScript would be used to parse the grammar and execute semantics, but the syntax of the grammar file would have to be made up (see discussion below). I think a first draft should be able to deal with simple input like: "a=1;b=2;print(a+b);" as a target string. So I think the grammar would look like:

//exp.g

  program ::= statement | statement ';' program {stmt($1) | stmt($2);prog($3)}
  statement ::= assignment | print {assn($1)|prn($2)}

Where semantics are in the curly brackets similar to bison but would map branches of the parse tree created from the target string to actual JavaScript functions called stmt(), prog() ... to do the execution. The details of those metafunctions could be still defined in the .g file since the program could snip them off and use eval()

Should grammar use the same syntax as proper BNF or something simpler? (Note: the "simplicity" of deviating from the standard often isn't so simple.)

Why not modify http://vb.wikis.com/wc.dll?Vb~JavaScriptSandbox~VB and try build one on the fly? I.e., copy it to a new page (JavaScriptPolyanna?) and use the power of wikis - c2 for the discussion and vbwiki for the implementation since it allows JavaScript and that page has been up for a while. Use textbox1 as the grammar string and textbox2 as output (instead of just evaluating JavaScript as it does now). Initially it could produce a RecursiveDescent parser (also in JavaScript) in textbox2. A textbox3 for target string and textbox4 for final output could then be added once the initial generation seems correct. Then textbox2 could be "folded" back, hidden in the execution so the target would run directly from the grammar instead of going through an intermediate step. It would make the discussion more concrete to collaborate on actual working code, and less hassle than downloading and uploading code to hard drives. Perhaps use the expression grammar in the RecursiveDescent page as a starting point.

I'll check that out. I had been thinking of writing a prototype in Lisp (which also has eval) since that's the language that I'm most familiar and comfortable with. Then, if it turned out to be a good idea, maybe write a non-prototype interpreter and also (if at all possible) a compiler or a "gcc-front-end generator".

Lisp or Scheme would be interesting, too. The thing with JavaScript is in theory multiple people could contribute to the code on the VbWiki. I'm interested in the concept of WikiWithProgrammableContent as well as interpreted grammars so to me that would be KillingTwoBirdsWithOneStone?. I'm sure many people here are experts in parsing if each person contributed a couple functions it could probably be done in a few days. I tried to make a start at http://vb.wikis.com/wc.dll?Vb~JavaScriptPolyanna~VB however it seems too complex for the editor there. That's not to say relatively complex JavaScript won't work - compare http://vb.wikis.com/wc.dll?Vb~TicTacToe~Wiki. The polyanna page does work locally (for what it does so far just passes text between textareas) and if you view source and search for

  <td valign="top" width="100%"><html><head><title>Polyanna</title>

Then copy/paste to the closing </td> as polyanna.html on your local drive and debug you can see what it does. Don't try "edit"; it loses some syntax in the pasting there and if you use "edit" it gets truncated but you can at least see the concept. Alternatively you might be able to first paste a JavaScript Scheme interpreter like http://www.bluetail.com/~luke/jscm/repl.html (a basic prolog interpreter works however without the DCG features see http://fox.wikis.com/wc.dll?Wiki~PrologInterpreter~VFP) then it could be done in Scheme but still collaboratively. Interpreters within Interpreters within ... but that's one of the things that's interesting about learning about language structure. It might be possible to break up some of the complexity by distributing the program among pages see an example of calling a function from one page to another at http://vb.wikis.com/wc.dll?Vb~WikiSandbox2~VB choose "Click here to show". Interestingly I had tried to add Scheme as well as an imperative language as targets just to test generality in the eventual parser, before reading your post.

See also ftp://ftp.cs.indiana.edu/pub/scheme-repository/code/lang/lalr.tar.gz it does not just generate a parser as output it actually meta-interprets a grammar syntax/semantics specification and uses it to interpret strings of the grammar similar to the prolog example above. See lalr-test.ss which specifies a simple expression grammar. I had to replace "(1+" with "(+ 1" in lalr.ss to get it to work but other than that it was a short example of some of the concepts above. Most other parser generators make code as output ( CodeGeneration ) which requires a second step to run.

I just discovered ThueLanguage?, which is extremely similar to what I intended Polyanna to be. --Sunnan

CategoryProgrammingLanguage DenotationalSemantics