Programming Languages Should Not Model English

I would have named this page "ProgrammingLanguagesShouldNotModelEnglishTooClosely?," as obviously, we can't have people going around and claiming that everyone should use BrainFsck or Unlambda, but the title was long enough as it was. Please don't falsely claim that I advocate completely un-English programming languages.

Anyway...

ProgrammingLanguagesShouldNotModelEnglish. Why?

Perl. That's why. Too human for it's own good. Programming languages should not have support for pronouns. Also, like English, Perl has many strange rules that it violates itself, and slightly different spellings that mean hugely different things. Perl 6 is supposed to be better at this...

Ruby. That's why. Japanese-support, and none the worse for it (except for a few funky keywords). Although, you still can write this abomination of grammar:

 if true
   exit!
 end unless true
...which will do nothing.


Specifically, I am pointing out that English is not the perfect language model. I mean, I'm OK with this:

    if (condition){ whatever; }
    whatever if condition;
BUT they can sometimes make parsers more complicated, as well as comprehension.
    while(<>) { print; }
is a fine example, and so is
     while (<>) {
       chomp;
       if (s/\\$//) {
          $_ .= <>;
          redo unless eof();
       }
      # now process $_
     }
taken from perlsyn itself, no less! Even if it was foreach(<>) instead of while(<>), this is the perfect(ly awful) blend of English mapped onto doing something in a programming language. Stupid pronouns.

Edit: There were other ways in which it was bad:

The reason why this has anything to do with anything? In English, this would read like this:
    While there are lines left in the file:
       chomp it,
       If there is a slash at the end of it, replace it with '',
           Append the next line to the current line, and
           Go back to the beginning of the loop unless we are out of lines.
       Now process it.


The most 'English' language currently in use is InformLanguage - in particular Inform 7 (http://www.inform7.com). It is a truly remarkable language, using a domain object model, rules, logic, heuristic matching for MultiMethods-style verbs. It even gains a bit of modularity via ability to manipulate rulebooks. Admittedly, it isn't at all suitable if you need concurrency (i.e. for multiple players), or live programming, or a number of related properties, but it remains suitable to its purpose. I think it provides a fine counterpoint.

Admittedly, the point on foreign language users is well taken. I would prefer the English be a layer atop something else, thus admitting to other layers. Inform 7 is a layer atop Inform 6. Things like RealMacros or ExtensibleAttributeGrammar?s can go a long way towards supporting these layers without ever leaving the language, which is rather nice (since it keeps everything in the same IntegratedDevelopmentEnvironment).


</rant> Eh. Actually, I really just wanted to say that English has a bunch of really stupid ideas.


Counterpoint:

Actually, the argument presented here appears to be centered around the lexicon of various scripting languages rather than their approximation of English. The example shown of a snippet of Perl code then translated into straight prose algorythm shows the distinction quite clearly; the algorythm can be read and understood by anyone, whereas the Perl code is a series of tokens that read as gibberish to anyone but a Perl scripter.

The use of a common lexicon for programming is a very powerful aid in the spread and cohesion of understanding. English is a powerful language, particularly in the American dialect. It is common to find authors and poets who chose to compose their works in English even if it is a second language to them. This is because they can express ideas more fully in English than in their native tongue.

Therefore, perhaps a new proposal is appropriate: All programming languages should even more closely mimic American English as their lexicon. If everyone speaks the same language then barriers to communication fall by the wayside, eh?

English is rife with ambiguity, redundancy, and needless verbosity. A far better "common lexicon" would be a universal symbolic calculus, which would have the benefit of requiring that all its users possess a minimal common level of abstract understanding, plus it would be free of ambiguity and entirely independent of human spoken or written languages. Native French speakers, for example, would not be pointlessly disadvantaged by any arbitrary adherence to "American English".

Hmm. This seems pretty IvoryTower to me. Forcing people to develop an understanding of some abstract symbolic mathematical lexicon before they can even code up a simple algorythm would have a serious impact on the productivity of the programming world en toto.

Consider that all modern programming languages use English as their base for the keywords and you get a fairly clear picture right away. Although English -- like any spoken or written language -- has shades of meaning and room for interpretation other than what the original author had in mind, programming languages have very little of that. Words are used to express instructions to the computing device to perform certain tasks. As long as those words are chosen with care and then used properly in a program the computer has little choice about how to interpret those instructions.

Of course, it is still possible for the programmer to misunderstand how to express an instruction, but this problem persists whatever language or other technology is involved. The use of a common written language as the base of a programming language is an aid in reducing these errors, not a hindrance to it. N'est ce' pas?

Ah, but you're suggesting we force people to develop an understanding of English before they can even code up a simple algorithm, yet modern mathematical notation is already universal throughout the world. Which, therefore, is the more unreasonable prerequisite?

Use of English owes more to the historical might of English-speaking countries in early computing development than anything else. There is no inherent value to using English, only arbitrary precedent. But for a few accidents of history, all modern programming languages might use Russian as their base for keywords. Or Chinese. Or Japanese or German or French.

We have no spoken or written language that is "common" except for one. We have languages that are popular -- like Mandarin Chinese, Spanish and English -- but there is only one common to all the world's people. Notably, it is a language that is singularly appropriate for precisely representing algorithms and computational concepts. It is mathematical notation.

Can I see, "I want a tuna sandwich without pickles" in math-ese?

Can I see that in any programming language? I do not propose eliminating user-defined identifiers. I counter, however, the proposal stated above that "[a]ll programming languages should even more closely mimic American English as their lexicon." (In doing so, I've deliberately avoided any references to the origins of COBOL. Until now...)

The variables and functions of math usually still need to be stated in a natural language. Math without documentation can be quite cryptic.

As noted, I do not propose eliminating user-defined identifiers. I oppose the notion that more English-based (or, in particular, "American English"-based) keywords, verbose built-in language phrases and/or constructs, etc., are a good thing.

English is the defacto standard for programming languages. However that came to be, we're stuck with it. Take note that English is also the official language of international air traffic control; if you don't speak English you don't fly across international borders. English is also used as the universal language of business. English is the language in which treaties are created, contracts are signed, and the UN does the bulk of its business.

I have no doubt that if mankind makes its way out to the stars that some form of American English will be the universal language spoken by the descendants of Man. As has been pointed out to no end, a common spoken language is the way to ensure that all parties have a means of communicating with everybody else at the party. Let's party.

Therefore, since we already have a "standard" language for spoken communication, why not use this language as the base from which programming languages are derived? When a language uses a keyword such as "save," the programmer doesn't need to ponder over the subtleties and nuances of his interpretation of some obscure symbolic reference -- the meaning is very clear. Trying to put this into a mathematical environment does not add clarity, merely another level of abstraction that the programmer need interpret his desires through before creating an instruction to the processing engine.

Obviously, because unlike programming, none of these things have a mathematical basis... Note that the use of English in air traffic control is actually a source of miscommunication and confusion, and the UN handles the bulk of its international meetings with simultaneous translators.

Given current progression, if humans make their way out to the stars, some form of Chinese or Urdu will probably be the universal language. Historically, Greek, Roman and Persian have been "universal" languages, so there's no reason to expect American English to hold any precedence. This is especially true as the "American" or USA "empire" seems to have been particularly brief in terms of historical scale, and is unlikely to retain much, if any, significance a hundred years from now.

Indeed, we have a semi-standard language for spoken communication, but programming is not spoken communication. I'm sure there will always be a need for the occasional keyword along the lines of "save", but certainly not for control structures like 'if', 'for', 'while', etc. These should surely be represented symbolically and would soon be familiar to everyone, much as +, -, *, etc., are already universally recognised.

In particular, there's no justification for any keyword being specifically in some American flavour of English, as suggested above. Generic English is surely appropriate, if English is necessary. What would "American" English be in this context, anyway? "Giddy-up" instead of "RUN"? "Yee-haw" instead of "Ok"?

Ahh, now we see the true nature of your objection: penis envy. A non-American, non-native-English speaker, are we? Jealous of the might and majesty of the American Empire, eh? Well, that's understandable. After all, the government of the USA has held for 230 years, which is longer than the government of any other country of note on the planet. As the world's only remaining "super power" the USA is sure to draw such ire. Heh, heh, heh.

By the way, do you have any references to back up your claim that "the use of English in air traffic control is actually a source of miscommunication and confusion"? Please proffer links.

Now, to address the issue of standardized keywords and symbols: surely plus and minus have universal meaning, but then you use the asterisk? The asterisk?!? This is one of the most overheated symbols in all of programdom! Laden with hidden meaning and overlapping uses, the poor asterisk finds itself the object of scorn and ridicule. Nonetheless, a proper keyword can put even this mess to rights. As long as that keyword is in Americanese, of course. [ahem]

Watchoo talkin' bout, Willis? The government of the USA typically holds for four or eight year terms. The governmental structure has held for 230 years, but that's not particularly long as governance structures go. As for my, er, "penis envy", I'm a native-born US citizen. My first language is English. I also hold citizenship in a second English-speaking country and am a permanent resident in a third. I have, therefore, something called a "perspective", which leads me to the indubitable fact that the "American" brand of English is particularly impoverished and will undoubtedly be short-lived and fairly insignificant by any reasonable measure of historical duration and/or impact. The US is a colicky baby, as nations go, and it will either outgrow its current phase or (much, much more likely) become a negligible province of China, maintained solely to host that particularly objectionable brand of tourism that waxes all misty-eyed over uncomfortably pinchy cowboy boots made from endangered-species leather, over-sized homoerotic hats, and disturbingly anthropomorphic pre-pubescent cartoon rodents and trouserless fowl.

http://www.scribd.com/doc/19647051/Miscommunications-in-Air-Traffic-Control

Are you attempting to be amusing? If so: Fail. [You should have said, "If so: Splat." :-] There, now that's amusing.

The government of the USA is the same one that was put into place 230 years ago. The people who operate the government have changed, as have the rules and restrictions under which the government operates, but the Constitution of the USA allows for these changes. In fact, the Constitution of the USA was drafted with evolutionary changes in mind; that's why there are Amendments to the Constitution. For a US citizen you show a surprising lack of understanding of basic civics. This is necessary to graduate from high school. Did you go to school here?

The USA as the "melting pot" of the world is still the main driving force behind the greatness of our (well, my, anyway) country. The fact that people come from all over the world to the USA to participate in the most productive society of all history is part and parcel of this. The fact that everyone in the USA speaks English to actively participate is a binding force, not a drag.

One reference to a twelve year old Master's thesis doth not an argument make, forsooth. And I am amusing -- even if you don't think so.

Ahem. However, we are getting a bit far afield. The original thesis was programming languages should not model English. The antithesis is programming languages should more closely model English. What is the synthesis?

I've gone to school, in one form or another, in three English-speaking countries, but you're correct that I did not graduate from a US high school. Your local definitions of government are irrelevant. By any reasonable definition, the USA has had the same (essentially) governmental structure for approximately 230 years. On a historical scale, that is hardly significant. It's barely even notable. Note that the pre-Empire Roman Republic lasted approximately 480 years under the same governance structure, and the Chinese dynastic system held for approximately 2100 years.

The notion of the USA as the "melting pot" of the world, along with a self-attributed "greatness", is a uniquely American arrogance. It deprecates all the other world "melting pots" -- such as the UK, Australia, Brazil, Russia, Canada, and South Asia in general -- whilst unduly promoting some specious notion of "greatness" that has no objective basis. Every country thinks it is "great".

No, but one reference to a twelve year old Master's thesis doth evidence make. Talk to pilots and air traffic controllers who work international airports if you don't believe me. Other than various hypothetical electronic solutions, the only reasonable approach is to make sure all pilots and air traffic controllers are native English speakers from the same region, because even regional accents amongst native speakers are a problem.

COBOL was an attempt at more closely modeling English. That speaks for itself.

[You shouldn't make many conclusions from a small sample. InformLanguage (mentioned above) and LiterateProgramming also have something to say on the subject.]

I assume this section was not meant to be taken seriously. I mean, American English? Surely that can't be meant except in jest. Certainly natural language processing is a significant topic, but a forgettable gadget for creating interactive fiction and a peculiar documentation-oriented deviation by a formerly great computer scientist are hardly notable contributions to the field.

[Notable to whom? Anyhow, I think I'd like to see our natural languages begin to model our programming languages. I suspect it will occur over time. Already, phone numbers and URIs are a big part of inter-personal communications.]


Okay, it's time for signed contributions, since we seem to have people who insist on injecting their own observations into the middle of other people's statements. To wit:

1) A common spoken and written language improves communication, reduces communication errors, and promotes greater and faster progress. Having a universal language could be nothing but good for all participants.

2) English has, so far, been the basis for all modern programming languages. The American dialect has been used extensively, probably because the bulk of computer science has been developed here. Therefore, we already have an established lexicon of programming language keywords and constructs based on Americanese.

3) Languages which use phunky expressions and weirdo symbols like Perl may be very compact, but they are a pain to read and interpret when doing maintenance or feature enhancement. Languages like Basic may be verbose, but they read easily and are easy to maintain and extend their applications.

4) Programming languages need to use keywords and constructs that can be easily understood and are difficult to misunderstand. The simplest way to ensure that such constructs are used in programming languages is to model the programming language on a common spoken language lexicon.

With these points in mind the obvious answer is to model programming languages on the American dialect of English, the most powerful and expressive language in the world. If there are contrary viewpoints I'd like to hear them, but simple naysaying is not welcome. C'mon, people. We're supposed to be smart, worldly-wise professionals here. Let's have some real alternatives, eh?

-- MartySchrader

What, exactly, would a language based on "the American dialect of English" look like and how would it differ from any other "dialect" of English? The only "dialects" of English that are uniquely American are either cultural -- e.g., African-American vernacular -- or largely tongue-in-cheek derivations of regional spoken accents, like generic "southern", "redneck", Maine, etc. There are no educated or formal American written dialects distinct from British, Canadian, South African or even Singaporean English. There are differences in spelling convention, but these are obviously of negligible concern. -- I. Talics

[As a native English speaker for years, I do not believe it is an especially good language for improving communication and reducing communication errors. We should really use a language that is incapable of double-entendre, equivocation, and depends far less on analogy, metaphor, onomatopoeia, HandWaving, and body language. Miscommunication is the normal state with English, and it isn't especially powerful or expressive (give it a try: TeachMeToSmoke). Besides, as someone who has lived in several parts of America, I am curious as to what "American English" really is. Do you refer to spelling (-ise vs. -ize)? to idiom? to colloquialism and dialect? To which American culture? Perhaps we should do a survey of Twitter spelling, and use that for programming?]

[I suggest that BasicLanguage is not easy to read because of its use of English, but because of its weakness for abstraction - and because of ArgumentByLabToy (my experience with big BasicLanguage programs is that they tend to become ExtremelyInterstrangled and thus difficult to read). I doubt Haskell would be much easier to read if you used 'function' instead of phunky arrow symbols. If your goal is for languages that lead to more 'readable' code, I'll agree that is an admirable goal. However, your jump to an "obvious answer" ("to model programming languages on the American dialect of English") is not a logical one, or is at least missing the relevant premises and inferences. I think the right answer (if there is one) will be moderately sophisticated. Code needs something other than 'familiar lexicon' to be readable.]

[Readable languages may very well require a change in the abstractions we use, simplifications that support local reasoning: agent-oriented rather than object-oriented programming might help read code without needing as much knowledge of the framework (ActorVsAgent), and ObjectCapabilityModel would help us reason about flow of both information and effects and thus 'read' code just by seeing its connectivity graph. Stateful and data-driven programming designs should help us read and understand code without needing to know when it is executed (or understand execution as a more continuous thing). LocationTransparency would let us read and understand code without needing to understand 'where' it is developed and executed. A flat NameSpace for code, similar to a Wiki, would probably serve developers better as they float and share code among projects. GraphicalProgramming would reduce dependence on arbitrary names and remembering the environment, and would likely be effective in supporting multiple 'views' of the same code and supporting ProgressiveDisclosure. Supporting multiple 'views' of code could help us grok larger projects, beyond a little block of code. LiveProgramming and interactive programming and ZeroButtonTesting would help us program and read code more like interactive discussion, with both sides providing cues of misunderstanding and gently pushing one another towards a complete, mutual understanding between computer and developer. Perhaps look up videos on SchematicTables? by John Edwards and CodeBubbles? for prototypes of ideas here.]

[But the focus on English, much less "American English"? It is my opinion that you are letting patriotism rub your logical phallcilites. We're supposed to be smart, worldly-wise professionals here. Where is your contribution?]

-- [Mr.E Brackets]

That last statement is in jest, I presume.

The idea of expressing computer program code as imagery is at least a couple generations off, I should think. We can't even get people to use UML these days. How are we supposed to come to an agreement on how to express command and control concepts using glyphs? Pictograms on "international" road signs are nearly impossible to decipher without knowing what the sign is supposed to be saying in the first place. Perhaps imagery as programming "language" isn't such a good idea.

Additionally, the idea that Americanese isn't a good language because of its power of expression is self-defeating. One would hope that language designers can take advantage of the expressive power of the Yank vocabulary without relying on colloquialisms and vernacular. A language using keywords like groovy, bogus, and waytoocool wouldn't be very well accepted in the computer science community, I should think.

But I want to know what you think.

-- MartySchrader

As I wrote above, there are no educated or formal American written dialects distinct from British, Canadian, South African or even Singaporean English. There are differences in spelling convention, but these are obviously of negligible concern. Thus, there is no such thing as "Yank vocabulary" unless you rely on colloquialisms and vernacular. "Groovy", "bogus", and "waytoocool" are not unique to American English, by the way. -- I. Talics

Perhaps, but these words were all injected into the vernacular from the USA and moved out from there, did they not? Just more evidence that the US influences cultural idiom far beyond her own shores. You can argue about trivia all you wish, but this point is not in question.

It's irrelevant, and no doubt insignificant compared to the vernacular the USA has brought in from immigrant and international cultures. The important thing is that there is no such thing as "Yank vocabulary".


 (x < y) ? x : y
 \x -> s(x)
 \f: 0 -> 1
   | 1 -> 1
   | x -> (f (x - 1)) + (f (x - 2))

[Many of us write most of our programs in glyphs, even before we start to work with spreadsheets and graphs and VisualBasic forms. Further, many of us use programming languages that use keywords and totally do not Americanize them. (lambda(x) (+ x 1)), for example.]

[And I disagree with almost everything MartySchrader says:

-- [Mr.E. Brackets]

Ahh, I see. Falling back on rudeness because your arguments are so weak. Very well, then -- since you have brought up this masturbation theme, GoFuckYourself? (if I understand the phrase correctly). There. Now the discussion has come to its rightful and proper conclusion.

Ummm, dude: You're taking this a bit seriously, aren't you? I mean, arguing in favour of English keywords has some merit, maybe... But American English??? You were joking, weren't you? -- I. Talics

[He's not joking (if he was, we'd see a punchline). I suspect 'egocentrism':

[His behavior is well-meaning, altruistic, negligent, foolish, and ultimately powerless, harmless. I think I'll momentarily regret stomping him down; it's like kicking an excited, well-meaning puppy that's tracking shit through your living room. I've grown too used to combat with our Wiki's resident toothless boar.] -- [Mr.E. Brackets]


Can someone explain the code used in the glyph example by describing it in words, please? Oh, and please use English, because the glyphs don't seem to convey what the original author intended.

Find a minimum (C syntax)

 (x < y) ? x : y
Fibonacci function
 \f: 0 -> 1
   | 1 -> 1
   | x -> (f (x - 1)) + (f (x - 2))
Modulo access to manipulate default parameters
 \x -> s(x)  [Same as 's']
I'm not sure what the syntax is for the last two, but the meaning is quite clear to me. The backslash '\' is used for the lambda symbol in a lot of languages (since it's just one stroke away).

Hmm. The first example seems fairly straightforward, but the others are coded in some strange symbolism that I find impenetrable. Could you parallel them with pseudocode or use some other mechanism to describe what's happening?

The symbolism for the latter examples isn't any more 'strange' than the first, and indeed the \x->s(x) symbolism is used widely among many languages (such as Haskell). The named-function symbol '\f:' is a bit odd, but can be inferred from its usage. If literacy is your problem, then perhaps you should be widening your horizons with real languages. But I have this feeling you're involved game of "I'm playing stupid, help me!" so I'm not feeling very charitable at the moment.


See: TheRightWayToDoWordyBlocks, DollarUnderscoreIsEvil

CategoryRant

MayEleven


EditText of this page (last edited May 30, 2011) or FindPage with title or text search