Terse Language Weenies

Terse Language Weenies are those who may advocate syntax such as AplLanguage, JayLanguage, RubyLanguage, CeeLanguage, PythonLanguage, PerlLanguage (and PerlGolf), and QompLanguage.

The opposite of Terse Language Weenies are VerboseLanguageWeenies.


If the instance be, 'ten and thing to be multiplied by thing less ten,' then this is the same as 'if it were said thing and ten by thing less ten. You say, therefore, thing multiplied by thing is a square positive; and ten by thing is ten things positive; and minus ten by thing is ten things negative. You now remove the positive by the negative, then there only remains a square. Minus ten multiplied by ten is a hundred, to be subtracted from the square. This, therefore, altogether, is a square less a hundred dirhems. -- (Al-Khwarizmi)

Compare above verbosity to:

  (10+x)*(x-10) = (x+10)*(x-10) = x*x + 10*x - 10*x - 10*10 = x*x - 100
Languages save us time (and confusion, once the language is well known by the studier), no matter how many arguments there are about terseness not helping one bit. Terseness (used the right way) adds precision to our field of work, and keeps the attention span higher in a screen of text (which is also a reason why short, to the point methods or procedures should be used wherever possible, instead of long drawn out ones.).

Not to say though that some terseness can seem to go too far. An example is a regex: once it is written it isn't so easy to tweak it, compared to a char by char parser that analyzes text snippets one by one. This is more to do with the abilities and goals of regexes, though, and is not purely the syntax and notation's fault. Rather, the fact that a regex isn't a full programming language itself is a reason it fails to scale for many situations (in other words, don't always blame the terseness of the notation, but blame the higherlevelness of the tool). Regexes are wrapping algorithms, they'd still not work great for some situations even if they were verbose English instead of character symbols - so can't just blame terseness, have to look at the entire picture.

  *&@$@
  Asterisk Ampersand At symbol Dollar sign At symbols
A compromise may be:

 Ast Amp At Sym Dol At Sym
But those symbols in English don't mean anything.. other than English.. and we do not want them to mean English since we are notating a regex (a language with symbols that mean something to our brain other than words... visually representing parsing almost).

Even when written in English a regex-like syntax is still the same maintainability wise. It's the same stuff... just in English, and the Ast Amp At Sym style notation doesn't offer many advantages in this case.

A regex does have different goals than other types of tools (i.e full programming languages). One problem may be that people use regexes in cases where they could use a programming language. A lot of long term maintenance scripts seem to use regexes quite often, and the extensibility of a regex is not great compared to a char by char parser. The problem starts when someone chooses to use regexes in the beginning and sticks with them - repairing them as the project grows larger. With time regexes become a duct tape of hacks in large source bases, and out of control. The wikipedia parser and Php Smarty source code are demonstrations of this.

AlternativesToRegularExpressions suggests that regexes could be reworked to be a bit more mnemonic.


From a discussion in AdaLanguage:

Remember that mathematical notation evolved during a time when paper and pen were the rule, and hand strain from hired (or slave-labor) scribes proved a significant limiting factor in transcribing texts. Math notation had to be concise, then. Today, it needn't be nearly as concise; the limitations of years past no longer applies. There is nothing wrong with domain-specificity. There is something wrong with overt attempts to re-invent and subjugate human alphabetical or ideographic systems in name of "brevity."

{Imagine if procedure brackets() were instead stated as "This Is The Start Of A Group of Parameters" and "This Is The End Of A Group Of Parameters" respectively. Symbols do have their use! Words in programming language are something we compose - and are very powerful.}

{But the fixed symbols such as assignment, plus, minus, beginning, ending, enclosing groups of parameters (brackets), etc. should indeed be brief. Why should they be brief? Because if you use "ADD_LEFT_TO_RIGHT" instead of "+" or "OPEN_OF_PARAMETER_GROUPING" instead of "(" or "BEGIN_OF_BLOCK" and "END_OF_BLOCK", it becomes obnoxiously hideous and inelegant.}

{The "is" keyword and the requirement to end each procedure in Ada with NeedlessRepetition of the identifier word get on people's nerves.}

{. . ., especially since procedures that are kept short can be seen in one screen anyway; there is no use for much of the Ada boiler plate syntax (I'm not even so sure what 'boiler plate' refers to, sounds like a BuzzPhrase}

{but I assume it is something along the lines of "up front ridiculous verbosity that appears to look like a verbose English contract but is really just a lot of fluff". The contract could be shorter and more concise, and just as correct - some Ada advocates act as if the reason Ada is safe and successful is mostly because of its verbosity}

{which is silly. It's more of the fact that Ada doesn't encourage dangerous structures such as pchars, pointers to pointers to pointers, etc.}

I see two problems with brevity versus verbosity: {Indeed it is even suggested that beginners see (or use) languages such as Oberon, TutorialDee, as an introduction. However, programmers (beyond beginners) need not put up with beginner and newbie notation and syntax. Programmers, need a migration path to a language that let's the notation do the work while beginners need a language which is more like English. An equation in math may be explained by an Essay to a child, but an adult or teenager and most children should be able to read a precise equation that uses notation and symbols. The symbols and notation, however, shall be consistent - and not baroque and inconsistent. Perl and APL are examples of where letting baroque and complex symbols and notation do work has gone wrong. Related: Leibniz's Dream}

I agree. You seem to propose to have different languages for learning programming: Simple explicit redundant ones for beginners and terse uniform ones for experienced developers. Or do you think it would be a good idea to have one language that is good for both. Or is that impossible?

It is tough to have a WinWin TheBestOfBothWorlds setup where we can have it both ways. Having multiple languages causes NeedlessRepetition, as does having multiple syntax choices - no matter if two separate languages are chosen, or if one extensible language is chosen. I am guessing you are heading toward discussing Extensible Programming Languages or domain specific languages. One option also is to have a compiler mode system, where one can switch the the learner newbie notation and syntax off.

The only problem with this is you still need to read source code written by relative newbies, whose source files will be in "newbie mode". The author's original beef seems to be that verbosity is undesirable from a maintenance point of view, which I contest isn't true. I make a clear distinction between languages that optimize reading (typically verbose languages, provided it's not as excessive as CobolLanguage) versus writing (as exemplified by JayLanguage and, to a lesser extent, CeeLanguage/CeePlusPlus).

In order to realize this happy medium, we need a system where source files are stored in a purely symbolic notation -- roughly approximating IntentionalProgramming, so that the user's editor can be individually configured for program representation when viewing the pre-parsed source.

Well one can use a tokenizer to convert the syntax to newbie mode... it is definitely possible. Syntax highlighters could replace b with BEGIN symbol on the screen. A syntax highlighter could replace { and } with BEGIN and END too. This is not a dream - it is easily do-able. The question is whether it is really worth doing and to what extent? The assignment (:= or =) could be replaced with the words SET TO or ASSIGN with a syntax highlighter too.

[Don't be too hasty, there. I've experimented a great deal (on paper) with the extent of what syntax highlighters can get away with. Manipulating the actual symbol stream causes problems - among them: it teaches people to inject symbols that aren't actually part of the source-code, it requires an additional context-sensitive translation layer on the input stream (so that when someone inputs '}' in a particular context it becomes 'END' in the source), and it increases probability for language-recognition errors and thus divides a developer community even further (DomainSpecificLanguages and competing libraries are bad enough already).]

[As far as assignment operators go, I've some affection for <-. But set works well, too.]

I didn't intend a 'compiler mode system', but rather a syntax that has short and long forms (in the example above '{' and 'begin' could be used interchangeably; of course the syntax would have to ensure that this cannot lead to ambiguities in only some of these cases like 'begin' also being a valid identifier in some cases). Then together with either suitable editor (like the one proposed above) or a tool to convert between long and short forms (yes, difficult in the presence of linebreaks and formatted comments etc.) be have the required language. One nice extra thing would be type inference: Experts only need to state few types and beginners can add all of them. The conversion then removed/adds all that can be inferred automatically.

Reason for compiler mode: consistency and sanity. Having both adult and child modes in the same source file is very ambiguous, leads to inconsistent mixed and matched code. Conventions and style are up in the water. Plus the compiler is much harder to maintain when different notations/syntaxes can be used interchangeably. In fact it reminds me of Ruby's ambiguous loop choices and if logic choices, or the C preprocessor. It is wise to pick one consistent way wherever possible (what's that python saying? [there should be one - and preferably only one - obvious way to do it]). One consistent style per each source file is (in some people's opinion) better.

However, the advantage of mixing and matching child syntax with adult syntax in to the same hodgepodge would be immediate backwards compatibility. But that leads to parsing complexities and IMO horrible ambiguity. I've seen people use the C macro preprocessor to make their C code look like pascal - and it is scary when they use C notation throughout half of the file, pascal notation for the other half, or worse: two different scattered, mixed and matched styles in one file!

With compiler modes, if someone creates a module in newbie child syntax, it can be compiled (used) along side with an adult styled module. They are compatible at the module level. Modules remain consistent throughout - which is superior IMO to inconsistent unenforced modules! The compiler modes are not just a theory, by the way - it is actually how qomp currently works. As an insult to FPC, maybe "MODE FPC" will indeed be called "MODE NEWBIE".

I think notation inconsistency (combining a lot of ambiguous syntax into one module) is evil. Consistency of one source file is always preferred!

[I also share the opinion that consistency within a language is important - SymmetryOfLanguage is far more important to me than aiming for terseness. I suppose that is why I feel that, if you ARE going to have inconsistent syntax forms for the language, you at least ought to have a consistent means of making it inconsistent. My own suggestion in this vein would be to avoid 'hard coding' both long and short forms into a language; instead, make one form that is 'good-enough' (the 'standard' form) and then make the language extensible (as per ExtensibleProgrammingLanguage) such that people can add and tweak long and short forms of the language along with any desirable macros or DSLs (those being the main reasons for extension). If it is non-monotonically extensible, you can actually remove and combine and mix-and-match different language-forms (at least so long as you don't make things too ambiguous for the parser to handle) and even remove the 'standard' form and the ability to manipulate the syntax (e.g. creating a restricted language-form for students that teachers will be able to read and comprehend). The only potential problem with this approach is that conversions don't possess any obvious 1:1 correspondence between forms, so rather than converting the source for experts and newbies one would depend on an editor that has other nifty features like hover-text and colorings indicating how a particular segments of code are being parsed - i.e. to help people who aren't experts in a given language-form or DSL understand how it is being processed and what things mean.]

What about Lisp and Ruby since they offer domain specific extensions (and tweaks) already? They are existing tools at our disposal right now. Would your suggestion be reinventing existing solutions that we have available already?

[Lisp does not offer syntactic extensions, only semantic ones. RealMacros does not imply syntactic extension. RealMacros are also spatially limited in their application, which prevents syntax manipulations on a broader scale. Ruby possesses a MetaObjectProtocol, but that also doesn't qualify as syntactic extension (i.e. the parser is unaffected). OperatorOverloading, polymorphism, creating domain-specific libraries of functions, etc. - none of these things determines notation. And while I wouldn't suggest reinventing solutions we have already... making existing solutions available? yes, that I'd recommend. Most of the research done on syntax extension from the seventies to the nineties hasn't yet been integrated into a modern programming language.]

As a partial Dijkstra and Wirth follower (they are not Gods, indeed), I think allowing too many extensible features into a language can cause readability problems, because programmers are not disciplined and they have egos (even artistic and creative abilities, which can be harmful).

[I think bad programmers can find enough rope to shoot themselves in the foot no matter which language they're using. And if you're too limiting, even good programmers will start cramming data and behavior into strings and external files that are even more opaque, less readable, more difficult to verify, secure, optimize, type-check, and debug, etc. Programmers should always have the power to do what they need without temptation to implement an incomplete, slow, buggy version of one language within another (or, worse, need to rely upon 3rd-party CodeGeneration). If doing so requires making easy the extensions and implementation of other languages, so be it: at least those other languages can take advantage of my language's optimizer, type-checking, unit-testing, debugging, etc. If I do it right, they can even leverage the syntax-highlighting. They won't be "slow" and "buggy" and they can fall back upon the root language to obtain "completeness" should a sub-language require it. It is not a bad decision, IMO, considering this automatically supports DSLs and domain-specific extensions which offer various other clarity advantages (reduced syntactic and semantic noise). My policy: Give well-integrated power to the programmers within the language or they'll repeatedly seek and re-invent poorly-integrated power from outside of it.]

Okay so consider Rake, since people brag about Rake showing the advantages and benefits of a DSL (I don't consider it a DSL though, it is more like a DSE): wouldn't it be just as useful to have a build system written in Ruby itself, and not the "Rake Ruby" Language? Consider FpMake or PowBuild? which is written in FPC, without it being a new language or new extension to the language!

How can people create a elegant build tool like FpMake, using their existing language without any extensions? Why is Rake requiring domain specific extensions and FpMake is not? Are we 100 percent sure that domain specific languages are required in many cases which they supposedly shine? There is proof with fpmake that one doesn't need to extend the language to make a build system... so I have my doubts.

Even procedural paradigm is "domain specific" ability of some languages - it allows us to quickly create make systems since make systems are very procedural: Build this, Build That, Delete This, Move That, Copy that. Supposedly the atrocious GnuMake is "logical", but from my experience GnuMake is horrid and ugly (even though it is a domain specific language!)

[Technically, libraries and modules of functions and macros and classes and such are already 'extensions'. Adding to the set of meaningful words in a language is always an extension to that language; it's just we're so familiar with function and sub-procedures in modern languages that we don't typically think of them as 'extensions' - we have more specific names for them. From the perspective of programming in MachineCode or BefungeLanguage or certain regular expressions, which don't provide even that much, they're clearly extensions. But they aren't 'syntactic' extensions.]

[Regarding your 'procedural paradigm is "domain specific" because it helps you create make systems' argument, by that same logic the Sun is a "calculator-specific" energy source because it helps run my solar-powered calculator. Please avoid such obvious fallacies in your ranting; it does much to discredit everything else you say.]

Languages that offer the procedural paradigm are domain specific. I'll prove it. Consider Java: one cannot make a program such as below, which is a batch task.. in a very terse, simple easy to view syntax/notation. The non-procedural nature of many languages severely limits the language's domain. In a procedural language one can make a quick batch program such as this:

  use fileutil;

pro moveFiles; b Clone('/tmp/*.tmp', '/foo/bar/'); Delete('/tmp/*.tmp'); e;

b moveFiles(); e.
We wish to copy some files - the procedural domain makes this extremely easy. Great for build/make systems since deleting, moving, and performing quick operations on files is often required (no new instance of a classes, short and simple).

Is Bash a shell (domain) specific language? Or general purpose? The line can get murky. It can be both. Languages with procedural abilities assist in the domain of batch tasks and prototypes. Computers were first designed to do batch tasks - a domain specific area (they weren't designed for GUIs necessarily).

[Sigh. You cannot logically demonstrate that procedural programming is 'domain specific' by showing it helps a particular programming domain. If that were the case then every KeyLanguageFeature would be 'domain specific' because such features help many different programming domains which always, necessarily, includes a variety of 'particular' programming domains. 'Domain specific' literally means specialized to a particular domain - i.e. designed for a particular domain, making assumptions or leveraging foreknowledge associated with a particular domain, optimized to a set of needs unique to a particular domain, etc. - usually all of these things at once. Your entire effort above is completely irrelevant if your goal is to prove that procedural is domain-specific. Similar to how you can prove the sun isn't a 'calculator-specific energy-source' by pointing out that it also feeds plants and warms the planet, you can prove that procedural isn't 'build/make'-specific by pointing out two or three distinct other domains it helps in, such as (for procedural) activity scripting, DSP and CGI.]

[In any case, I don't know enough about RakeMake or FPMake MakeTools to make any fully qualified statements about one or the other. What I understand about build systems in general is that dealing with updates-times (up-to-date checks), prerequisites and dependencies, cleanup, modes (e.g. debug vs. release), indicating where to search for files and where to place them, etc. are all important domain considerations that show up repeatedly. And so the expression of these issues needs to be optimized, even made implicit where defaults will serve - doing so makes the intent of the program clearer, which means you'll spend less time creating and repairing errors in the build itself and more time being productive. It is quite possible that a language could be non-intrusive enough to support all this using plain'ol functions and such without much overhead in getting the operations all glued together... but most general-purpose languages won't serve; they'll add undesirable syntactic noise for the task at hand. Based on a Rake tutorial (http://martinfowler.com/articles/rake.html), it seems that Rake takes advantage of Ruby keywords (':id'), blocks, and an object-constructor for a class called 'task' to automatically load task-descriptions into some sort of central repository that can then formulate a partial ordering of activities (based on dependencies) for any given build request. However, it also seems that Rake could use some extra work in the handling of already-built prerequisites and up-to-date checks: it doesn't make the up-to-date checks nearly as easy or implicit as GnuMake. The advantage RakeMake offers over GnuMake is a great deal of expressive and semantic power within those blocks of code describing each task. Relatively, at a glance FPMake I can't tell whether FPMake provides dependency management and up-to-date checks, but I can tell that it seems to create a lot of syntactic overhead for the programmer.]

On another note: It seems that a lot of the advantages of extending a language is about terseness. Make files, or make systems are terse (one should not have to verbosely MemAlloc? a *char just to make something, which is why people don't use Cee for making projects). Mixing SQL "strings" into a language is verbose and messy, and a built in SQL would be terser. Extensions/integration often create terseness! A lot of people may not immediately see this relation - but indeed the entire point of a lot of "domain specific" things are the fact that they create terseness: quick/short precise notations to use. When I say domain specific things, I don't think it is only limited to new macro extensions - because, like I say, FpMake is domain specific, and it doesn't use "macros" or any fancy extensions at all. Even better: one can use FpMake or PowBuild? without learning a new language - because it isn't one!

[The whole reason for DSLs is to remove the messy semantic and syntactic overhead and optimize the language to the needs of a particular domain. Doing so is not only about terseness; it's just as much about clarity... but terseness is one big aspect of it. But keep in mind that this isn't (generally) the sort of terseness that comes from trimming symbols down to a minimum width; it's the sort of terseness that comes from the ability to make assumptions and use default policies based on foreknowledge of the domain, to cut away general-purpose 'fat' and boilerplate code, etc. I.e. it isn't the same sort of terseness that is described as desirable for TerseLanguageWeenies.]

[And with FpMake and RakeMake both you need to learn a new library/API/set of words which isn't quite the same as learning a new language, but is very similar (more like learning a new jargon). The main advantage of these 'internal' languages is that (a) you have the full power of the language at your fingertips when attempting to handle complex tasks, (b) you get to take advantage of the debugger, potentially a compiler and optimizer, testing system, syntax highlighting and IDEs, error handling and reporting, etc. that is already provided for the base language (i.e. no need to reinvent these things). In relative terms, GnuMake lacks most of these features (though some editors provide syntax highlighting for makefiles).]

Re: "The main advantage of these 'internal' languages"...

FpMake is not an internal language though.. it's just using the existing language.

[Sure it is. It adds new words, new interfaces, new protocols, and all the aspects of a new language and a new culture... all associated with the build-system. It's just as much an 'internal language' as any library API or jargon or field vernacular.]

Another domain specific trick is to use the stack, [Specific to which domain?] to reduce line noise (AntiCreation). My point is that not all domain specific abilities stem from creating new languages - in fact, often a language has features within it that make domain specific tasks easier - such as stack instead of heap, the ability to use procedures instead of class instantiations - all these features in a language help the language even if only for the terseness that a stack, procedure, or imperative nature offers. A make system, is indeed much easier to create in fpc or ruby, because Ruby and FPC are much more terse than Java. It's not the only reason - but it plays a big role. The procedural paradigms offered in FPC and Ruby (Ruby uses global DEF's as their procedures) also play a key role.

[Features and languages are called 'general-purpose' when they help in a variety of 'specific' domains.]

Make files help with batch tasks, making, copying, installing, and domains other than just making. Perhaps regexes are an example of a really domain specific language (regexes are even more of a notation than anything.. are plain wildcards *.* a language? WhatsaLanguage) - lone and behold wildcards and regexes are terse; once again demonstrating the power (and danger) of terseness.


Interestingly, an example of a terse markup (arguably a partial language) that is extremely useful is this wiki markup on C2. Imagine if this wiki markup was more verbose? Would we be as motivated to contribute, as programmers? Sure - beginners may prefer a more verbose markup, but the terse wiki markup is indeed useful. In fact even beginners may not prefer something as verbose as XML or BBCode for wiki markup.


See also: HackerLanguage, ExpressionApiComplaints


MayZeroEight

CategoryWeenie


EditText of this page (last edited April 22, 2013) or FindPage with title or text search