Some Words Of Advice On Language Design

This advice by has been copied from LambdaTheUltimate:

http://lambda-the-ultimate.org/node/687#comment-18074

Author: FrankAtanassow? 2006-06-04 22:08

I do not know if it is legal to copy it here but I guess it will get ThreadModed shortly such that it becomes derived work (effectively citing only parts) instead of a plain copy.

Some words of advice on language design

Before you go off inventing new programming languages, please ask yourself these questions:

1. What problem does this language solve? How can I make it precise?
2. How can I show it solves this problem? How can I make it precise?
3. Is there another solution? Do other languages solve this problem? How? What are the advantages of my solution? of their solution? What are the disadvantages of my solution? of their solution?
4. How can I show that my solution cannot be expressed in some other language? That is, what is the unique property of my language which is lacking in others which enables a solution?
5. What parts of my language are essential to that unique property?

If your answer to 1 is "it's cleaner", go home. If your answer is "it has this very small core which everything is definable from," nobody cares. (Well, I might care, but you will never convince me it's interesting without some mathematics. There is already a one combinator basis for untyped computation. SKI was known decades ago. For typed languages, it's more complex but also mostly pointless.)

If your answer to 2 is, "I will write programs in it after I have a prototype," then you have not thought very carefully about it; also, your feedback cycle is too long. If your answer involves more than one buzzword, you are kidding yourself.

If your answer to 3 is, "I don't know," then you don't know enough. There is always more than one solution. (Trust me; I never write, "the solution to this problem is..." anymore, and when I read it and I don't see a proof of uniqueness, the writer invariably turns out to be full of it.) If your answer involves only languages of one paradigm, likewise. Go study Scheme and Prolog and ML and Haskell and Charity and Lucid Synchrone and OBJ and Erlang and Smalltalk. Look at Epigram or Coq or HOL or LEGO or Nuprl. Aside from Java, these are the important ones. If you are familiar with all of these, then you are in a decent position. If you have only ever programmed in C/C++/Java and Lisp and scripting languages, you have been sitting in a corner your whole life. Perl, Python, Ruby, PHP, Tcl and Lisp are all the same language. (Scheme itself is only interesting for hygienic macros and continuations.)

If you don't have an answer to 4, then your solution belongs in a library, not a language. In fact, the people on LtU are the perfect people to design good libraries, and libraries, on the average, are much more valuable than languages. Library design is also easier, and you won't waste as much time on syntax. (You will waste time on syntax. You will waste almost all your time on syntax.)

If your answer to 5 is "only this and this", rip those out of your language and add them to an existing language. ("Refactor mercilessly.") If your answer is, "almost everything contributes," I can almost guarantee you you are wrong. (If you aren't, then you are probably a researcher.)

The reason most new languages are pointless is that people rarely answer these questions honestly. That is why language design is hard, and why researchers hardly ever make new languages.

Oh, and, I know this will fall on deaf ears but: don't indulge in syntax design. Pick some other language's syntax style. Java, Lisp, Python, Haskell, it doesn't matter. Just get it out of the way and close the matter immediately. If your language's original contribution is syntactic, you are hopeless.

Think about language features in terms of asymptotic complexity — not of space or time, but of, well, complexity. (I would say semantics, but that's a dirty word.) Syntax changes can only reduce that complexity by a constant factor. A good language feature changes complexity from n-squared to n or to n log n. It increases modularity by localizing something which was global. The best features do so without compromising any other desirable properties (such as type safety).

I would also add that there are more opportunities to innovate in typed languages than untyped, and concurrent languages than sequential. (Personally, I think the only interesting thing in sequential, untyped languages would be something involving delimited continuations or narrowing or compilation.)

Here endeth the lesson...

-- FrankAtanassow? at Sun, 2006-06-04 22:08

RE: Oh, and, I know this will fall on deaf ears but: don't indulge in syntax design. Pick some other language's syntax style. Java, Lisp, Python, Haskell, it doesn't matter. Just get it out of the way and close the matter immediately. If your language's original contribution is syntactic, you are hopeless.

On the contrary! Many think that Digital Mars D and C++ language made the mistake of using C syntax. If one is going to keep the same syntax and reinvent the same language with a new name, then why not just contribute to the original language? For example, a lot of Digital Mars D critiques wonder what the point of Digital Mars D really is, since it uses the same syntax as C++ and looks just like C++. True it has garbage collection, but not that much people see a useful difference! Keeping the same syntax is often wrong. Often, things like "==" and "=" (which even the C language inventor thinks was a bad choice) are criticized as syntax failures that should be changed! Often the syntax is where languages go wrong. Why didn't python use curly braces from C? Take Algol or Ada as another example, which many consider to be overly baroque syntax. Or take again C++ which people feel is obfuscated syntax. Really, if you are going to pick the same syntax, is your language really much new? It's also a bit ironic that you say one should pick the same syntax: the whole reason for existence of Python, Lisp, and Haskell, is that it is not C++ syntax. Yet you say to pick the same syntax. But if you did, you wouldn't have python, Lisp, or Haskell!

Frank suggests picking a style - or, more importantly, to not dwell too long upon syntax. You're still free to correct what you see as deficiencies. I'd make an alternative suggestion: if syntax matters so much, then design a good macro system into the language so you can fix it later. The Fortress language and many others are going that direction. You point out 'mistakes' like "==" vs. "=", but I'd note that languages have plenty of other failures. To suggest that "syntax is where languages go wrong" is to suggest that languages are perfect, except for their syntax. To say that "the whole reason for existence of Python, Lisp, and Haskell is that it is not C++ syntax" is a magnificent delusion (C++ wasn't even around when Lisp was created).

We aren't anywhere near the point where syntax is the primary failure of language design. And, even if we were, we have very few mathematical properties and scientific observations we can use to determine one syntax 'better' than the other, excepting for error locality (ability to detect and localize an error, ability to parse in the presence of syntax errors).

I agree with Frank. All of this seems very good advice to me, and his warnings seem accurate in my own experience. Most of those questions I asked of myself after wracking my head over the possibilities of using Lisp, Clojure, Erlang, D, E, or Mozart rather than creating a new language for purposes of optimizable distributed-programming of distributed-systems. The original answer to question 1 was desire to build something of a programmable web for distributed games without need for the centralized 'authority' seen in MMORPGs or SecondLife today. While my vision has broadened a bit with WikiIde, I still consider such games the ultimate test of the language (requiring end-to-end optimizations for latency and bandwidth and graphics, security, etc.). Answers to 3 and 4 repeatedly came back to the lack of support (read 'enforcement') for an explicit SecurityModel, or lack of safe concurrency semantics for network connections, or the heavy use of mixed state and functions which has horrendous effects on optimizations in distributed systems.

We have quite a few budding LanguageDesigners on WikiWiki. I'm one of them. Many would do well to pay attention to "If your language's original contribution is syntactic, you are hopeless." Time to KillYourDarlings.

It would be interesting to study the history of successful programming languages. There was this one language, that people didn't like, called Algol. Well, lots of people did like this language actually, but they hated one thing about it: baroque syntax, verbose and complex. A programmer went along and created the C language, basing it on B, BCPL, and Algol. The main part of C is structured programming, which came from Algol. The difference between Algol and C is that C is a smaller simpler language with different syntax. Sure there are other differences, but really C and Pascal reek and smell an aweful lot like a mini Algol. It's like the C inventorwas reading "Structured Programming" one evening and decided it was a good idea! So he created C, and it was just Algol with different syntax. Another programmer by the name of Niklaus Wirth, created an Algol clone which had simpler syntax, (and yes, there were other things also changed, not just syntax). Many successful languages were in fact trimmed down and improved "Algols". CeeLanguage and Pascal's original contribution was mainly syntactic over Algol. CeeLanguage was addressing the lack of syntax in B/BCPL. Without syntax (sugar) you've got assembly language and machine code. The C and Pascal inventors found some other issues to repair also, along the way, in addition to syntax. They may have started out with an original contribution, like syntax improvements, and then found more contributions to make too: like say adding the idea of "records" or "structs" instead of just loose single untied together variables. But records are syntax sugar. Or how about someone adding TutorialD into a language to improve on arcane structs and records. Most improvements involve some sort of clever but powerful syntax sugar. Python is about syntax too. You can do a lot in python that you could do in C. But is Python useful because it is like C, and the same syntax was chosen?

It's not just syntax, but syntax is a huge factor. What you view on your screen, is mostly symbols and syntax. What those symbols and syntax mean is so obviously important, but what's really different about Russian and English is that they have different symbols and syntax. What's different about Python and Lisp is their syntax. When you run the program, they can both do the same thing.. but how the program looks is what is different.

I know, I know, you will argue it's NOT SYNTAX, but sorry, it is. This is tongue in cheek: it's all about syntax. Well, it's not all about syntax, it depends on the sensitivity of your cheek.

And also you might want to research Lisp. Without it's syntax, it wouldn't be Lisp. I'm no fan of Lisp, but the very reason Lisp is Lisp is because it leaves the semantics kind of up to the libraries and to the programmer. The syntax of Lisp allows it to be Lisp -without it's syntax it wouldn't look like Lisp.

Of course what a language 'looks like' is influenced by its syntax. Nobody is denying that. And neither Frank nor I have argued syntax to be irrelevant. SyntaxMatters. But I still agree with Frank that a change in Syntax, by itself, is insufficient to motivate to the language. It's a hopeless cause. At the very least, if Syntax is your only significant contribution, you should consider creating a generic preprocessor like Syn or camlp4 so that it can be used with any language. Most people seriously into programming languages tend to think one stack-based procedural imperative language with simple types and lacking exceptions as much the same as another with the same properties, regardless of syntax. New syntax for the same concepts is NOT the first thing they notice about a language; a change from '==' to '=' for equality tests would barely even be considered. If you ask them their opinions on syntax, they'll be forthcoming of course - because of the BikeShed problem. That is, syntax is a concern that everyone understands and has opinions about, so it's what people who have nothing more useful to contribute will tend to focus on, to no productive end.

I've never separated syntax and semantics as drastically as you do, I consider it pointless to try and completely separate them or imply that one is more important than the other. Since the discussion of what syntax means always involves syntax, I'm not even sure what your purpose is in continually trying to patronize people about the differences between semantics and syntax. The syntax of a language is like the frame, doors, windows, and structure of your house, not the paint. If you go into a house and all the windows are welded shut with big clunky knobs on them that work only if you melt the metal, rattle four times, and do a secret handshake to finally make use of the window, then you're working with crappy baroque syntax. Although all syntaxes work in the end, it'd be much more convenient if the syntax was easy to turn and use. That's one reason why languages like TutorialDee, COBOL, and Pascal fail by the way: way too many big verbose knobs that no one enjoys using and unwelding every time. But I'm no fan of analogies.

It is common in languages with extensible syntax for programmers to develop in two parts: (a) create a library or framework that does such and such. (b) create a syntax that makes this library/framework convenient and easy to use. But if you meant that they are modifying the core language, that's dubious. A semantics for the library under the hood has, as far as the compiler, optimizer, and interpreter are concerned, no relationship to the extended syntax. It is situations like this that give me the impression that syntax is 'painted on' above a semantics. There is one language style that brings the two very close together: term rewrite systems, such as ObjLanguage and MaudeLanguage?, unify syntax and semantics. You can tell they are unified because all legal optimizations and transformations are made explicit by the programmer.

Re: "That's one reason why languages like TutorialDee, COBOL, and Pascal fail by the way: way too many big verbose knobs that no one enjoys using and unwelding every time."

A large quantity of key-words hard-wired into the syntax is a language design smell. Most of the key-words should be parameters just like any other parameter (see UniversalStatement) such that "base" operations and custom-built ones are syntactically indistinguishable and swappable. Heavy-typing and/or compile-centric languages generally purposely limit the dynamicness in order to provide more compile-time checking such that the advantage of "meta" syntax is smaller. This is not so much giving an excuse to compile-centric languages to hard-wire key-words into the syntax, but merely a statement that the down-sides of such are smaller in such languages, perhaps so small that the few upsides of hard-wiring start to shine. I tend to prefer dynamic languages, but this isn't the place to re-invent the static-versus-dynamic debate. -t

Re: "That's one reason why languages like TutorialDee, COBOL, and Pascal fail by the way: way too many big verbose knobs that no one enjoys using and unwelding every time."

It's notable that TutorialDee and Pascal are both languages intended for pedagogy rather than production, and COBOL was originally intended to be particularly easy to learn so that it could be used by non-programmers. Are verbose languages really easier to learn?

Furthermore, it would be difficult to say that COBOL and Pascal "fail" for any definition of "fail", and TutorialDee (as of this writing) is the most active alternative to SQL going, with several implementations.

That's because it's the "relational Ada", and "satisfies" that niche. SmeQl, on the other hand (hopefully) targets the "dynamic" crowd more, such as LAMP fans. SQL is kind of a "relational COBOL/Fortran" hybrid. I believe there is more demand for "relational ADA" than "relational Python" because the "scripty" crowd tends to shun databases and rolls their own because of the perceived stodginess of existing RDBMS. It's my belief they have yet to see more flexible dynamic query/table languages and dynamic databases (such as DynamicRelational) in action. The TutorialDee crowd, on the other hand, have seen SQL and don't like some of its theoretical "impurities" and its limited column type system. C. J. Date etc. highlighted many of these issues. Somebody has yet to do the same for "dynamic tabling". Other than ExBase and perhaps APL-dirivative fans, few have seen it in action enough to become addicted. Experience with static RDBMS will not give one the sense of possibilities by itself. -t

I understand every one of your words but none of their meaning. Please consider using logic, illustration and persuasion over vague terms like "scripty" and "impurities" or invented phrases like "dynamic tabling" and "relational Ada" that clearly mean something to you but not to your readers.

I think the original posting is patronising and mostly irrelevant to real life. It almost seems like an attempt to dissuade people from efforts that could really lead to innovations. He assumes for instance that a language is worthless if it solves two problems at the same time, that languages should only solve one problem. This leads to academic languages with limited applicability to real life. Also, of course any new innovative solution could be expressed in another language -- assembler for example -- but it is probably not comfortable or efficient to work in that language. Since a programming language is a means of expression, how the solution to the programming problem is expressed is what everything is about. For example, if a language's syntax is designed so that those things which are efficient are short and those things which are less efficient are more verbose, then this helps re-orient thinking in that language towards efficiency. Maybe there is no other difference to existing languages, but even that small change is significant. He says that Perl and Tcl are the same language! Yeah, right! Maybe at some distant level of abstraction, but not from a real-life programmer's point of view. Maybe the original poster is a great theoretician who has lost contact with reality, in which case I guess we may forgive him.

Perl and Tcl are the same conceptual language. Genuine differences between them -- those that matter enough to result in significant general differences in developer productivity -- are non-existent. Beware of conflating syntactic differences -- which, all too often, differ materially only in quantity and choice of keystrokes -- with semantic and paradigmatic differences that can have a dramatic impact on language efficacy.

CategoryProgrammingLanguage