Refactor English

The English grammar is a good basis for a language: it's simple and consistent. But the language evolved a lot in the last few millennia, and it has its problems. Since we can't change the language entirely, we only insert some new useful words, to make the language better. Here is the list of new words. Feel free to add some if you wish. Note: The explanation of the new words must be written in standard English. [Whatever "standard" English is :-)] Example may (and should) use the new words.

mu: See the JargonFile entry at <http://catb.org/~esr/jargon/html/M/mu.html>, and MuAnswer. Yes, do, even if you're already familiar with the concept. Some nice historical notes which aren't what you're expecting, unless you're me.

e/em: See SpivakPronouns


The English grammar is a good basis for a language: it's simple and consistent.

A Proposal for the Reform of the English Writing System

In Year 1, all redundant and superfluous letters would be removed such as the a in bread and the the b in debt. If a letter isn't pronounced or not used to mark a modified pronunciation, then it is redundant, potentially confusing, and should be cut out. This would include the e in have but not the e in behave.

Double consonants are inconsistently used to mark a checked (or short) vowel. Since the markers are unreliable, they would be done away with. With the elimination of superfluous double consonants: little becomes litl

Also in Year 1, the /e/ vowel sound would be spelt with an e rather than some other vowel leter or combination of leters. This means that e would be substituted for a in the word any.

If the long vowels were always consistently marked this might make sense. It is better to extend the short vowel marking convention to always show the stressed short vowel. Single syllable words would not need to be marked if the long vowel was always marked.

nif, niffty, nife. bet, better, bete [beet].

In Year 2 that useless letter "c" would be dropped to be replased either by "k" or "s", and likewise "x" would no longer be part of the alphabet. The only kase in which "c" would be retained would be the "ch" formation, which will be dealt with later.

Year 2 might reform "w" spelling, so that "which" and "one" [by the rules of Year 1, "one" will be pronounced like "own" is pronounced now] would take the same konsonant, wile Year 3 might abolish "igh" and replace them with 'y' and might fiks the "g/j" anomali wonse and for all.

c is redundant and unreliable but it is also the most common way to express /k/. Removing it makes a number of words appear odd. For example: "I was lost in the sitty without a sent. "

....

Jenrali, then, th impruvmnt wu'd ka'ntinu yir by yir. Bai yir 15 or so' it wu.d fainli bi posibl tu meik ius ov thi rida'ndnt letrz "c", "y" and "x" -- bai now ja'st a memori in the maindz ov th o'ld timrz -- tu ripleis "ch", "igh & i-e", and "th" rispektivli. An S look-alyk "5" cu.d bi iusd for "sh".

Fynli, xen, aftr som 20 yirz 'v orthografikl riform, wi wu.d hav a lojikl, ko'hira'nt speling in ius thruaut x Ingli5- spiiking w'rld.

-- MarkTwain

And if you're talking about the vocabulary then that's even worse. English is a bastard mishmash of French and German. About as logical and consistent as C++.

Importing words is often a good thing. The main problems are that English didn't standardize the orthography when it did so, instead choosing to do things like reintroduce Ks and Qs into words it already had, and that it imported equivalent sets of roots from both Latin and Greek, resulting in equivalent pairs like onomy-ology or multi-poly. Incidentally, I've never found C++ that bad, but this is dealt with on a thousand other pages here.

English does have a few things going for it. It has no gender so you don't need to waste time memorizing which gender a word has or conjugation rules. And just like SmallTalk and Chinese, there is no syntactical distinction made between nouns, verbs and adjectives. This makes both the language and the thoughts expressed in it more fluid. There's a third important advantage to English but I don't recall it at the moment. But simplicity is not one of the attributes of English. Even where English is known as relatively simple, like putting a verb in the past tense, it isn't. Consider readed, writed, maked, sleeped, sended, weared, begined, fighted, and bleeded.

If you want to RefactorEnglish, then make it acceptable for people to write:

"After I sleeped, I maked my bed. Then I catched up on my reading and readed the news."

This addresses regularity, but there is more to simplicity than just that. Latin is a language with relatively few exceptions but I expect very few people here would call it simple. English lost most of its inflections, though a few remain (he reads), but its grammar otherwise remains fairly complicated. For instance

 I walk
 I am walking
 I do walk
are all identical in most other languages (je marche, ich spaziere, ambulo...). For all the misery people have learning the difference between the imperfect, perfect, pluperfect, and so forth, English has all these forms of the past tense and a few more. And so forth.

Agreed. When I was told that English had 17 different tenses, I thought they were mistaken. I couldn't identify more than a half dozen. It took me a long time to realize they were right (see English Verbs: Every Irregular Conjugation by Valerie Weisberg, ISBN 0961091258 ).

Another story. When the Chinese authorities simplified the characters for a whole bunch of words, the population resisted it. It seems the arcane and overcomplex forms acquired a high-class cachet in store signs. Now I realize that this is the real reason why English can't be reformed. The rich snobs will look down on anyone using the new pronunciations and orthography. Gotta exclude those stupid foreigners! (And we know they're stupid because they can't speak proper English!)

German has undergone several deliberate and widely followed spelling reforms [but see http://german.about.com/library/weekly/aa081000a.htm ]. English can't do the same mainly because it has several diverse and separate groups of speakers. Recall that the Americans tried to regularize the orthography on several points (though not all the changes made sense), only to confuse matters further by introducing a separate set of spellings not followed in other countries.

Social tolerance of variant spelling came to an end around 1650 as 18th century notions of correctness began to emerge. By 1780, poor spelling became stigmatized. (http://www.implicity.com/reading/app3twoarchaic.htm) So I don't think it had anything to do with concerns with incompatibility between nations.

Webster's spelling reform took place in the 1870s, and was generally adopted by the Americans but not by the British, hence the differences in writing between the two. I can't imagine why this sort of thing wouldn't happen again.


Another problem with English is its ugliness. About the only uglier languages are German and some of the Slavic ones. That's because it has too many consonants and too few vowels. It doesn't even have 'ei' and 'oi' diphthongs which are common in French and German, respectively. And to add to its lack of vowels, English has the 'th' consonant which it shares (only?) with Castilian Spanish.

This isn't right. First, English has a lot of vowels, perhaps not in writing but definitely in speech. Compare father, air, hat, and babe. This is more than Latin or even German, despite its use of the rounded fronts. In fact, I'd say that having too many too close together is more of a problem. Incidentally, doesn't babe use ei and oil use oi?

There are two th consonants, the voiceless form from thing and the voiced from this. The former is very common; the latter less, which is odd because voiceless and voiced forms usually pair up, but definitely present in modern Greek among others. The main extra consonant is the verarized l in all (versus that in liquid), and maybe the postalveolars in ship and Asia. We have k and g but interestingly no corresponding fricatives, which are the sounds that give German a bad name but only as far as I can tell because foreign speakers usually say them with more phlegm than is conceivably necessary.

Personally I find German very pretty. It doesn't have the same kind of feel to it that the Romance languages do, but it would be an odd thing if that were the only sound system that could be counted as elegant. The composite noun system is much maligned but adds a lot of flexibility without the need to import foreign words. Latin gets the same but through a system of inflections which I assume we are eshewing here. Slavic tongues I am unfortunately ignorant of, so can't say whether or not the above is accurate for any of them.

You're right about oil, but the 'a' in babe is not a diphthong at all. It's a separate vowel (French e). 'ei' starts with the e in 'the' or the u in 'duh'. In French, the diphthong is most frequently written euil (two vowels only) with the l silent. The i in euil modifies the eu (a vowel that doesn't exist in English, similar to French u which also doesn't exist in English) to e. Some examples include oeuil (o silent so pronounced ei) = eye, deuil = mourning, seuil = threshold, and ceuille (kei) = pick (flowers).

The a in babe is supposed to be pronounced as a diphthong. It isn't quite the one you mention above, but then I don't know of any other language with that same sound in it, if I am indeed understanding your description correctly (it's been a long time since I spoke French). I think I see what you mean. I pronounce it as the diphthong ei in "Babe Ruth" but as a single vowel e in "babe in arms". And if I speak the word 'baby', it's with a single vowel but if I sing it, it's a diphthong. :)

I'd find your explanation of 'th' more interesting if I could actually hear the consonant. I can't, even under optimal conditions. I always hear either v, f or t. I know how it's supposed to be pronounced but like the Spanish 'rr' which is supposed to sound "completely different" from the French and English versions, that does me a fat lot of good. On that count, Spanish is worse than English because while I can at least pronounce 'th' (it's hard and I never bother), I've never ever managed to pronounce 'rr'. [To pronounce 'th', press the tongue against the upper front teeth. It seems awkward at first, but soon becomes second nature.]

Ok, this is a difficulty coming mainly from French's lack of the sounds. The th is as described above, though I don't think it becomes easy quickly. I myself, an English speaker my whole life, don't say r quite the same way as other people. The back R is somewhat easier - it is the voiced equivalent of the back ch found in German. I assure you these fit quite nicely into the sound systems of their parent languages. In fact, th would probably be more appropriate for French even than English, because it uses the dental t-d rather than the alveolar t-d.

German's writing system is impenetrable. I know they've had a lot of reforms to regularize it (the Germans are anal-retentive still) but that doesn't help. All the bad things in English, like letters having non-local effects on pronunciation (eg, the e in babe modifying the a, which isn't even in the same syllable!), come straight from German. And more! Apparently, ch has three different pronunciations depending on whether it's at the beginning of a word, at the end of a word or preceded by s. Or something like that; I gave up when I found that out.

Quite the contrary, German's writing system is mostly straightforward. The non-local effect from the terminal e does not exist, and I can't think of any others. The varying pronunciation of ch is a simpler matter of assimilation towards other sounds, like the different pronunciations of n in English night and ink, and would really not be that bad except it means English speakers have two sounds to learn, rather than one. You may be right about non-local effects. In German, the doubling of a consonant modifies the preceding vowel in the same way as adding the terminal e in English. You may also add an e right after the vowel. The entire short versus long concept of vowels originates in German and is absent in French.

No, this dates as far back as the earliest Indo-European languages. Latin definitely has long and short vowels, which generally go unmarked. German has a few signs to distinguish them, like the double consonant, and has the umlaut to mark the three additional vowels. French doesn't have the long-short distinction because the vowels have too many grades, generally marked by letter combinations like in oiseaux, where none of the vowels have the same value that they would on their own. English also has various grades but tends not to mark them at all.

I wish I knew the basics of Russian, if only to compare. In Polish, strings of three or four consonants (no vowels to separate them) are common. Strings of four consonants aren't even imaginable in a Romance language. And nothing and nobody is ever going to convince me that the ugliness of 4-consonant strings isn't embedded in the human genome. Unless German is very careful with compositing, it could also end up with similar strings of 4 consonants in words ...

I don't speak Polish, but I did have a Serbian friend surnamed Crngorac. That sort of string could never come up in German, because it relies on the use of r as a vowel. Actually, r, l, m, n, and ng can all be used as syllable nuclei and this is really only difficult because it is not common in Indo-European tongues. Other people will have no problem with it, and I'm not sure that I find it at all ugly. I don't see how 'l' and 'ng' can be used as syllable nuclei since they can't be sustained. By ng, I mean the sound at the end of sing, not that in the middle of ingot. It really isn't hard to sustain this or l if you try.

French is my mother tongue and all the phonemes I can distinguish come from it. So obviously I've got a bias for it. But French also has nasals in, on and an, and I don't particularly miss them; I do miss the vowels and the ei diphthong. This makes me think that it's not just a bias towards French.

Unfortunately I'd say a lot of it is, notably with the consonants. Aside from the above, English has a lot of oddities which it shares with French, and you haven't commented on these. Notable here are the velarized l, the gratuitous use of intermediate vowels, and the failure to distinguish between aspirates and unaspirates like the ps in pin and spin, which I can't even hear the difference between but which are quite distinct to people brought up under different phonetic systems.

I never noticed the aspirate/unaspirate distinction before, but are you sure you can't hear the difference? Get a friend to say "that's pinning" and "that spinning" randomly and try to guess which one they said.


Can you give some examples of gratuitous use of intermediate vowels?

A quick search gives http://faculty.washington.edu/dillon/PhonResources/newstart.html, which has a nice table of English vowels. American English, actually, which should be ok because I happen to know you're in Canada, and our vowels are closer to theirs.


When I initiated this page, I had several goals set in my mind:

I am not all for a total reform of the English writing system. It is mostly fine, and nice.

Anyway, the results of this page so far are very nice. Continue bringing up new ideas! -- AmirLivne

It still cracks me up that people seek a sexless word for "he" and "she" without finding "it" -- DanilSuits

For me at least, "it" feels like not-alive, so I prefer to use another word. But I'm not a native English speaker, so I don't know if it counts. -- AmirLivne

A tree or cockroach is an 'it' but a dog or cat isn't, at least if you speak with the owner. 'It' signifies no personality, no attachment, et cetera. -- AnonymousDonor

I've been lobbying for the introduction of 'shet' (subject) and 'shit' (object) as the genderless-but-animate pronouns. Both being, of course, contractions of she/he/it. -- JonathanTang


Serious question: Why do this?

Serious answer: for fun.

Not-So-Serious answer: For the same reason you refactor your programs: to make it easy, clean, simple, and that people will be able to understand it easily and express themselves quickly. A good language can become the next Esperanto, but actually work. Also, there are some useful things in changing or adding stuff to English. For example, see mu in the top of this page.


Ai haed en aidiea tu foos fonetik spelling on inglishspiiking piipl. Ai soot zei mait anderstaend mai raiting eniivei aend suuneer or leiter it vuid teik over. Tudei zis daz not siim tu bii sou greit aidie.


Vowels Overly Specific

Many of the vowels are not needed. Instead of "persistent" we could have "p*rsist*nt" where the asterisks, or some other marker, simply indicates a vowel position. As far as I can tell, "pursistunt", "persistant", and "pirsistent" are all pronounced the same. Usually it is "eh" or "uh" kinds of sounds that are dropped when pronouncing, often near "r", and usually on the non-primary syllable (what is the formal name for that?).

Want to buy some c*ke?

It said "many" not "most". Some vowels are simply not pronounced. Example: "Sep*ration". You could phonectically read it "seperation", "sepuration", "sepiration", etc., and nobody would know the difference (at least I don't). Thus, if we abstracted actual pronounciation, then we can toss out some information normally found in the existing written version. This non-pronounced information is the source of about 70% of my spelling errors. Note that for "sepiration" you would not pronouce it with an "ire" sound if you know that the "i" is not the primary syllable. The primary syllable is the 3rd syllable. It bends your tongue unnaturally if you try to pronouce "ire" if it falls on the 3rd. Try it. The primary syllable would usually not contain the vowel place-holder anyhow (perhaps never, but I haven't found an exception yet). This is why your "c*ke" example fails. Also, as far as I know, all pronounced "ire" sounds fall on the primary syllable.

We already have this symbol -- it's called the schwa in the English dictionary's pronunciation charts.


Becoming the next Esparanto isn't exactly something a living language aspires to. This page is full of some of the most asinine opining (let's refactor that... asinopining? asopining?) I've ever seen. (<-- like that one.) "English is soooo ugly," "English is this bastard language," "Americans can't spell `colour' properly" ... erm .. okay. You know what the neat thing about English is? You don't have to ask permission to change it. (<-- so, why the hell are you complaining again? Wouldn't such changes as you advocate here belong in the category of "assinine opining?") You want 'mu' to be a word, just start using it. "How you feeling today, Dave?" "Mu." // "What do you think of this design?" "Mu.". I'd suggest "eh" with a neutral inflection has already picked up the same meaning though, and you're still not going to sound much more profound than that grunt by sticking a real consonant in front.

As for spivak pronouns, consider Chinese. Even less inflected than English, though tonality is certainly a pain and a half, and where inflections remain, they're "weird"...

Your not-so-well-concealed dismissive that abandoning inflections implies a move towards tonality is illogical.


English has already been refactored extensively, to become the world-worthy language it is today. Compare against AngloSaxonLanguage, which originally had as many genders and inflected cases as Latin.

"Altered" and "fixed" are two different things. English's verb forms and spelling still stink heavily, dispite what worse may have come (came?) before.


English is continuously refactored in the one place that matters - the marketplace. Most of the world speaks or is learning to speak English because it's economically and socially useful to do so. And every time a new tribe starts to use it, it contributes new words and forms. Also, learning enough English to get along is absurdly easy compared to any other language. Learning to use it well - now that's another thing altogether - it's rich and complex enough to take a lifetime to truly master.

Beginning programmers always use if-then-else; eventually they learn when and how to use ?: Both get the job done but they aren't in the same place on the elegance scale. -- MarcThibault


Actually, the largest difference between English and other languages is that incorporates words from other languages without respelling/formatting them. How do you say soccer in Spanish? Futbol (which is phonetically pronounced very similar to the English word football). So, in Spanish, pronounciation is roughly preserved while spelling is butchered. In English, we preserve the spelling even if a word has no apparent phonetic meaning, resulting in endless arguments over pronounciation.

As an aside - central american Spanish is actually one of the nicest languages to learn because it's almost entirely phonetic. There's pretty much one way to pronounce each letter (although that way may vary from region to region).

-- MartinZarate

You mean *written* language. They don't have spelling bee's in Mexico. We could fix English spelling by making it phonetic. I just wonder if it can be done using the existing alphabet. Some sounds would require two or more letters, and that risks ambiguous combinations when put together. But we gotta try in order to test. --top

Lojban's alphabet may serve as the basis for such phonetic building blocks. It even uses the 'y' letter as a schwa. [[Lojbanz alfabet mei srv az ?e beisys for sutc fonetik byldng bloks. Yt ivn iuzyz ?e 'y' letr az a cua.]] As you can see, only "th" offers any kind of difficulty, but a different character can be assigned for that. In point of fact, ancient English used Y: (thorne) for the "th" sound, thus explaining the use of "ye" for modern-day "the". So, we can use 'y' for that purpose, and replace Lojban's schwa character with the upside-down 'e', a symbol already used in every English dictionary today to represent the schwa sound already.

A "tc" for "ch"? What's the reasoning behind that? Would "c" even be needed? Another thing, what about keeping the "th", but introduce some kind of punctuation, perhaps a dash, as a separator for ambiguous letter sections. For a quick-and-dumb example, "tothut" may be a hut where toddlers are kept. But this can be mistaken for "toth-ut". Thus, it would be written as "tot-hut". The rule could be that any "th" is assumed schwa unless there's a dash between the two letters. Introducing new characters could create lots of problems. Plus, the dash rule can be used for newly-discovered ambiguities.

Review the Lojban alphabet and its pronunciation. http://www.lojban.org It is fully disclosed on that website. To answer your questions point by point, however, 'c' has the "sh" sound in English. So, to pronounce "ch" as we do now, you put a t in front of it: tc <--> "tsh" <--> "ch". "th" in Lojban, assuming it had an 'h' at all (which it doesn't, BTW, but let's pretend), would literally be pronounced "t-huh". Lojban's rules of pronunciation and spelling are so unambiguous that hyphenation for disambiguation is obsolete.

Remember, Lojban is 100% phonetic. As you read or write it, you speak it, and vice versa. Strictly speaking, even punctuation is pronounced (e.g., .i for sentence separation, gy for quotation marks, et. al.). But that's Lojban; I'm just saying, let's learn from Lojban's alphabet and its phonemology if you want to refactor English with a more logical system. Lojban's literal translation is, after all, logical language.


Let's try this. (I'll call it "tinglish") for now. Here's the problem or confusing letters/combo's:

For (old) "boy", one would use "bohy". If you actually wanted "ba-hee", then it would be "bo-hy". "Boy" in the "new" alphabet would be the equivalent of old "buy" (as in shopping).

What dialect of English do you speak? I've never heard boy rhyme with buy, not in American, British, nor Australian dialects.

My responses: Goals to try to balance:

But, if we avoid diphthongs, then we end up with a language with about 20 distinct vowels! We don't have enough characters for that, unless you advocate going to an idiographic character set, like Japanese or Chinese. The sounds are distinct, so they should be rendered as different letters adjacent. There is no reason why wow needs a unique vowel when uau works just as well.

Regarding where a diphthong ends, who cares? What is important is that adjacent vowels when spoken have adjacent vowels as written.

The goal is to eliminate all exceptions.

Not if it makes other things crappy.


Here some "tinglish" examples with the original spelling, proposed spelling, and pronunciation tips.

"Separation" is kind of tricky. Perhaps it should be "seprreyshun" to emphasize that we stay on the "r" a little longer and that it's a complete syllable. However, there's no real vowel there in most pronunciations I can conjure. It's a ghost vowel. Perhaps it needs the "filler vowels" that used "*" a few pages above. However, obviously an asterisk is a poor choice (in most fonts) for a filler vowel. Thus, a double "r" may suffice.

Hmm, I pronounce separation, literally, as sepereicyn (using Lojbanic spelling).

But what is "er"? When I pronounce it, it's the same "r" one hears in "purr" (cat). It's just r's all the way down. There is no vowel that I can detect; It's just a fairly long r-sound. The only difference between "pray" and "purrey" (made up word, similar to "parade") is that one dwells on the "r" longer. No vowel.

I suppose that this is an example of one of the down-sides of phonetic spelling: if people pronounce it different, then it can be spelled different. Regional, family, and personal difference will exist.

How do Italians and Spanish-speaking folk deal with this? Sure, they have their dialects, idioms, regional and family differences just like we English-speaking folk do. Yet, they still agree on a standard pronunciation for their respective alphabets.


Questions:


(Comedian picking on English)


EditText of this page (last edited September 28, 2012) or FindPage with title or text search