Syntax Follows Semantics

Separate Presentation From Meaning


Syntax isn't bad so long as you understand what it is. Syntax is an interface between the programmer and the parser and/or internal program state. Nothing more, nothing less. So for example, when you turn on code highlighting in your browser, you've got yourself a new syntax for the language, though not to the language.

Ideally, a programming language should have literally no syntax, as opposed to Lisp's next to no syntax, and be defined entirely as parse trees. [What? Parse trees are syntax. What does 'literally no syntax' even mean? Maybe you meant to say 'no specified textual representation'?] The syntax would only come in from the authoring environment, and would be designed to convey significant semantic shifts. There should even be enough flexibility to define one's own syntax.

Example1: instead of writing

(lambda (x) (x + 1))

one would be able to write

[:x | x + 1]

Example2: instead of writing

[:x | x + 1] value: 0

one would be able to write

([:x | x + 1] 0)

Because let's face it, evaluation is a significant semantic concept which Smalltalk neglects to portray properly, and lambdas are a significant semantic concept which Lisps neglect to portray properly. And since they come as a matched pair, what could possibly justify giving one of them special syntax but not the other??

Bah, if you're going into this direction, just start to use HaskellLanguage. Besides, [:var | ... ] is really bad syntax for lambdas.


RE: "Ideally, a programming language should have literally no syntax, as opposed to Lisp's next to no syntax, and be defined entirely as parse trees. The syntax would only come in from the authoring environment, and would be designed to convey significant semantic shifts. There should even be enough flexibility to define one's own syntax."

I agree with this principle, but I should note that it is immediately killed in practice: another ideal language feature is portable code (write once run anywhere). In order to make source-code portable between environments, the syntax needs to be specified as part of the source, such that it is always transmitted when the code is transmitted. In order to specify a definition for one's favored syntax, one needs to start with a syntax within which to provide this specification. In order to make this syntax specification portable to other systems and compilers, this initial syntax must also be part of the language standard. Thus, a programming language needs to have a syntax (even if it's just to define a syntax). In the event one uses a preprocessor, like OCaml's Camlp4, the syntax for at least the preprocessor AND a 'standard' syntax for the language must be defined (so the preprocessor-rules can be specified and can translate into the standard syntax).

It's common to allow one to extend the syntax by adding to it (e.g. via use of macros and operator overloading). If you wish to provide the real deal on flexibility, you need to make it so the users can remove syntax rules from it (even those found within the initial, core set) or replace entirely the initial syntax with a new one (using an expression in the initial syntax). Of course, non-monotonic syntax introduces its own set of practical difficulties. It requires either that you transform the consequent of parsing into something MUCH closer to the semantics structure (instead of program text, so the transformation isn't subject to later removals of syntax) or that you transform to a common language that has a well defined semantics (e.g. as with Camlp4). Essentially, the relevance of an AbstractSyntaxTree on the actual syntax in use is diminished when "+" might mean something at one place and mean something entirely different after some extension or non-monotonic alteration of the syntax. This also requires more parse-time evaluation (possibly performed by a preprocessor). (The most significant difference between using a common preprocessor vs. using a language-standard mechanism for extending syntax is in the error reporting and debugging.)

Also, if you wish to allow for addition of arbitrary new tokens (as opposed to just keywords), you cannot separate 'lexing' from 'parsing'.

It's worth noting that a fully generalized syntax extension system, especially with non-monotonic changes, is hell on tools like RefactoringBrowsers and automatic highlighting. Those tools will essentially need access to full parsers of the language. This would imply wisdom in the possibility of making the parser of the language into a library.


Ideally, meaning (facts) should be separated from presentation issues. This is kind of what CommonLanguageRuntime strives to achieve by making syntax less relevant. It is also one of the reasons I lean toward TableOrientedProgramming. If programming is shifted more toward filling in and managing attributes and relationships, then it is easier to change the presentation of such facts to suit your needs. It can reduce HolyWar's by allowing someone to see information how they see fit. Things like use or non-use of semicolons are just customizable presentation issues.

-- top

Aren't "tables" themselves a presentation? You can easily view a table as an XML document, a CommaSeparatedValues file, a bunch of BeeTree indexes and an associated record file, or a set of Haskell data types (HaskellDb). Similarly, you can view a CSV file, a record file, or some XML documents and Haskell data types as a table.

The separation between presentation and meaning is entirely dependent upon what software you have available to convert between different presentations. -- JonathanTang

That's the point. One can view such any way they please. However, current tools are not very well set up to make such seamless. But MS at least seems to be tilting in that direction. (Bee trees? Ideally tables don't expose indexing technology except under cursor techniques, but that is outside of relational I would note.) As far as "tables being a representation", I suppose ANY representation can be accused of that. But the idea is that the information is already "atomized" to some degree. Parsing at the sub-value level is not needed. One is dealing with atoms (values) and links between them instead of characters and parenthesis and semicolons, etc.

If allowing someone to see information how they see fit is the true reason behind TableOrientedProgramming then it should better be called RepresentationOrientedProgramming? or even RepresentationExpositionOrientedProgramming? or something like that. And as a conclusion I'd expect that you would as much support making the data structures visible as e.g. files (attributes, filename=columnname, content=value) in directories (rows) in directories (tables) in volumes - if the application were more data-heavy and most attributes weigh in the MBs (like e.g. a CMS). All in all this would match your critique of OOP as that features among others the idea of information hiding. -- GunnarZarncke

It's worth noting that the allowing someone to see information how they see fit can be significantly more difficult than is the task of parsing (even keeping in mind that parsing can be TuringComplete for unrestricted (type-0) grammars); e.g. it is quite possible for there to be irreversible transformations to occur within macro-expansions and semantic conversions. This is part of what will hurt IntentionalProgramming.

Are you talking about human-level parsing or machine parsing?

I'm talking about the presentation layer, mostly. To allow someone to see information requires creating a structure that, when parsed, represents information. To do this as a human sees fit requires creating a structure that represents information in a format and at an abstraction level described by the human. However, representing information at higher levels of abstraction (without any actual loss) requires reversing all sorts of macro-expansions and separation of large parts into their components. It's a much more difficult problem than parsing whether a human OR a machine is performing it... it's essentially a disassembly and refactoring problem to whatever level of abstraction the human sees fit. IntentionalProgramming is even one more step more difficult, in the sense that it needs to represent information as seen fit for both the human AND the machine. Both need to be able to parse the result, such that the human can change it (e.g. do a bit of refactoring, fix a bug) and the machine can read it back with changes, all without loss of information.

Much easier is to allow someone to see information as the machine sees fit, then teach the humans to understand the machine. Humans tend to be the more adaptable of the two entities involved, and LeastFlexibleProtocolWins. Another (rather nice) approach is to allow someone to present information as they see fit, and provide a transformation layer for the machine. However, that approach (while useful) isn't generally reversible... thus it's largely a single-direction information path.

I fully encourage progress in the other direction, but the difficulty of this problem is very emphatically NOT something that should be dismissed with a little hand-waving. At the moment, the closest we've come is part way in both directions: CSS and Database mechanisms to adjust the 'view' of information, and various common syntaxes for intermediate representations... like XML or top's TableOrientedProgramming.


See also IntentionalProgramming, HiddenDatabaseSyndrome, SeparateMeaningFromPresentation


CategoryInteractionDesign


EditText of this page (last edited April 20, 2011) or FindPage with title or text search