Its Time To Dump Cee Syntax

CeeSyntax is very low-level. Today, in this era of CPUs orders of magnitude faster than the ones in use when C was invented, we could make programming accessible to a lot more people by replacing C syntax with a much higher-level syntax or apply the syntax lessons learned over time.

Well, Pascal is easy to read (on paper too) and it's blazingly fast (see freepascal compiler speeds versus GNU C speeds). We don't necessarily need to increase the CPU and increase the compilation time in order to dump CeeSyntax.

See also AlternateHardAndSoftLayers, BetterSyntacticSugar, ObfuscatedCee, ItsNotTimeToDumpCeeSyntax

Big refactoring done by BenKovitz. I have saved the previous copy if you want to retrieve some of the old text.


Summary of Suggestions:

Several languages with CeeSyntax have already done much of this. CSharp, Java, JavaScript, all have CeeSyntax, so we need to separate the C complaints out.

In fact, only "put types and modifiers on right side of variables" seems to be an actual issue with Cee syntax - the others are just gripes with CeeLanguage. And even that goes away when you use a DynamicTyping language like JavaScript, which most would say has CeeSyntax.


Examples of unnecessarily low-level C syntax

K&R C does not have a "for each" construct. To iterate all elements in a typical, real-world composite data structure, you have to understand how to construct the loop and dereference the correct number of pointers to pointers.

Most C programs use null-terminated strings, or strings whose length must be explicitly specified.

Does this page object to just some specifics of the syntax of C, or to the syntax of all the languages that use a C-like syntax, including C++ and Java? Java doesn't have this problem with strings, but it has a "C syntax" in a broadly understood sense. Also, I don't think anyone is a fan of C's lack of strings, except for super-low-level applications.

EeLanguage has a syntax similar to C, but it also has a "for each" construct, built-in strings, and relatively little unnecessary surface complexity. Users of E don't seem to complain about the syntax. So it seems that the main objections on this page are to other aspects of C's design than its syntax.

With C++, it's reasonably easy to create a macro that uses standard STL iterators. Macros aren't terribly clean, but this is a pretty trivial expansion. Of course the C++ iterator idiom is so ingrained that most programmers always look for for(Foo::iterator i=c.begin();i!=c.end();i++) and wouldn't visually recognize a foreach loop. The worse offense is the lack of easy map/fold syntax.

C has been described as a "super assembler," meaning, it is as low level as you can go without getting into the specific instruction set of the underlying hardware device. This has a great deal of value for those of us who work with raw machine control and highly specialized and optimized computing tasks. Many of the components of C syntax relate directly to what the processor is doing, which is part of the reason so many good optimizing compilers are available for every microprocessor and microcontroller you can shake a stick at. Why would we want to give this kind of execution control up?

{Maybe our chips suck and need a more modernized instruction set, such as native string handling.}

[Modern CPUs, and even some pretty old ones (I'm looking at you, PDP-11) support native string handling primitives. See, for example, http://www.plantation-productions.com/Webster/www.artofasm.com/Linux/HTML/StringInstructions.html]


Before reading the following (up until hrule), better read ZeroAndOneBasedIndexes. EditHint: can probably be merged.

Indexing from 0. This reflects the underlying addressing scheme of the CPU. Intuitively, we number a sequence of elements with ordinal numbers starting at 1.

In C syntax the index is the offset from the beginning of the array. It has nothing to do with indexing in real life.

If you don't like indexing from 0 you could always... (Warning: Evil)

 main()
 {
  int storage[10];
  int *my_array = &storage - 1;
  ...
 } --BruceIde (No I don't advocate this...)
Evil isn't the right word for it. Illegal is closer. UndefinedBehavior is accurate. C doesn't permit you to construct a pointer that points outside of an object (such as a variable, struct, or array), except for a pointer to one place past the end of an array, and even then it doesn't permit you to dereference that pointer. See the comp.lang.c FAQ for more information. (The above code will probably work on many systems, but if the array is located at the start of the memory arena then a segfault would result as soon as the address &storage - 1 is loaded into an address register on certain computer architectures.)

Am I the only one who gets annoyed by sequences that start with 1? Maybe it's because I work with mathematical sequences so much. -- JoshuaGrosse

No. you are not the only one. It is ridiculous to claim that "we" intuitively start sequences from 1. Many of us will intuitively start from 0. This point is similar to the column major vs row major array storage issue. It certainly isn't something you can claim is empirically better one way or the other.

Most of us use both 0 and 1 based offsets intuitively. Measurements start at 0, while counts start at 1. A baby is in his first year before he is one year old.

More examples, please?

On a trip, when you've travelled 0 miles, your are in the first mile of the trip. Most people don't think, "What mile of the trip am I on?" They think, "How many miles have I traveled?"

Most programmers have 0 millions of dollars; we are all working on our first million.

More generally, bank accounts start from 0 dollars, not from 1. If you're flat broke, you don't go to the bank to take out the 1 dollar that must, at a minimum, be there. (OK, you might, but they won't give it to you, and will probably look at you funny.)

Bank account value is not a sequence.

The first floor of a building is (usually) offset 0 floors from the ground. (AmericanCulturalAssumption! In the UK, the floor at offset 0 is the ground floor; the first floor is the one at offset 1.)

In France, ground floor is 0st; in Latvia, 1st (guess in all ex-USSR, because there everything was made after some state standards. People are strange ...

It seems that it's more natural to start with zero when dealing with real numbers, but when dealing with integers it's more natural to start with one.

Aaaaargh. Zero-based indexing is the solution, not the problem. Just read EWD 831 (http://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF). -- AnAspirant

I've realised that the problem isn't 0 based or 1 based, it's half-open intervals. We naturally say 1 to 10, Monday to Friday, and mean that inclusively. Thanks Djikstra for making things difficult because he thought saying 0 to 0 should mean no items instead of one (how often do we seriously worry about empty sets and saying 0 to -1?).

In VHDL, array types may start at any number and number in any order e.g. range (1 to 10) or range (0 to 9) or range (9 downto 0) or range (10 downto 1) are all arrays of length 10. It turns out that this freedom is problematic if actually used as you have to write lots of code to deal with interfacing the various cases - the solution, by convention, uses only one.

That suggestion does not meet your stated requirements. Suppose 'word16' is defined as a 'bigendian integer' ranging from -(2^15) to 2^15-1 as suggested. Suppose this code is compiled on a 36-bit word machine (without 2's complement signs, by the way), whose C-compiler (naturally) handles this type as an 18-bit signed half-word. What does this do to your file formats? Yes, I'm picking nits. Yes, I'll stop now.

Eh? There's no such thing as 'without 2's complement signs'. Anyhow, it would be more proper for 16-bit words to be 16 bits regardless of the host machine's natural word-size. This is true even if it has the cost of making things less efficient. C language doesn't make it easy, but there isn't any good reason why a person shouldn't be able to read or write a file in 23-bit chunks if that is how they wish to do so. The inadequate support for absolute control over word-size, bit-ordering within representation, and even word-alignment, as part of the C standard is a gripe I share with SamuelFalvo?. But it isn't really a C-specific issue... it just comes up in C more often because C is often used more often for the low-level IO stuff.


Teaching newbies

Introductory programming classes could use extremely high-level systems such as ASP or php based Web scripting, to gently introduce programming-track and non-programmers alike to the basic possibilities of programming.

Students interested in becoming software experts would of course take additional classes that would cover material such as assembly language, C, and operating system internals. However, for a first class to use C or a C-like language such as Java, means introducing surface complexities (e.g. "public static void Main (args[])" that just needlessly scare off people who could benefit from JustaLittleScripting? in their life, if only they weren't taught from the beginning that programming was mainly about finding where you forgot to put the correct semicolon or closing parenthesis.

If folks don't care enough to get past syntax (and understand semantics), maybe they should do something else?

Well PHP may be "extremely high-level", but for a newbie it sure isn't "easy to read". I sure didn't find it easy to learn. I started wondering how an integer could magically just be a string, and if an apple could be an orange? And why was I using all these dollar symbol signs and curly braces? Gee, sounds normal to a C programmer.. but for a non-cprogrammer, they start wondering "why all this use of symbols every second key?"

I think a better language to learn with would be one which is more English based, and less sloppy with a neater block syntax - i.e. not one that is C/Perl based. Maybe Smalltalk, Pascal or God Forbid even VB? Maybe it's easy to create a quick and dirty web page in php -if you already know php- or you already know C. Personally, I think most PHP programmers come from a C or Perl background, and already know C or Perl. Most ASP programmers probably come from a VB background. So then when they jump into PHP or ASP it seems easy, familiar, and hackish. But for newbies.. I think web scripting is too html/document and symbol based, and they should learn something more english and application based. They're first project could be a program, or a web page. A web page is just a word document, though, remember. "all this programming just to make a crumby looking html page which I could have done in Word or Excel?" A working program at least gives them some satisfaction ;-) ''


It's my understanding that the C language was created by and for computer programming researchers whose main purpose at the time was developing the UNIX operating system. Further, I understand that although the minicomputer they used was one of the least expensive machines available at the time, with a 1971 price tag in the neighborhood of $10,000 it cost more than a programmer's annual salary. I seem to recall reading an article by one of the principals - could have been Ritchie, I'm not sure - about how their interaction with the machine at the dawn of the C/UNIX era was via teletype. C was certainly a remarkable breakthrough in their ability to conduct their research and development efforts; just look at how quickly they were able to get UNIX started.

Today, the least expensive computer systems available are embedded chips which cost a few dollars in quantity, are generally more powerful than the system available to the C designers, are built into a huge range of products from microwaves to automobiles, and which don't need to receive text or commands from the user or provide text back to the user at all. Computers which are designed to process text are now literally thousands of times more powerful than the minicomputers of the early 1970s; handheld systems dozens of times more powerful than the C-era mini's can be purchased on a whim for less than a programmer's day rate; and even the most hot-rodded personal computer today can easily be bought for less than a single programmer makes in a month.

Computers as expensive as a year of programming are only needed for the largest of commercial and scientific enterprises, such as generating animation for feature films, handling thousands of customer orders a day, simulating weather or global climate, analyzing the stock exchange, or running high-energy physics experiments (either experimental or numerical). It is impossible to saturate the capacity of most of today's personal computers without doing either audio-visual editing, interactive simulations with realtime 3D graphics, or the largest of software engineering projects such as compiling the kernel; and these software tasks hardly take enough time on today's average PCs to justify a break for coffee or tea.

Using a language like C, which involves explicitly allocating memory and juggling pointers, is a waste of effort when the end result of the programming effort does not justify the effort involved in having a person keep track of those implementation details rather than letting the computer do it. It bothers me that many open-source projects use the C language rather than a more appropriate higher level language, because C is comfortable and familiar to the developers based on their education and work background rather than because a conscious choice was made to do development at C's low level.

-- ChrisBaugh

http://cm.bell-labs.com/cm/cs/who/dmr/chist.html is an excellent history of the CeeLanguage by one of its inventors, DennisRitchie.

Unix had already existed by the time C was conceived, but was (like most other OSes at the time) written in assembly language. The whole purpose of implementing C was to port it from the PDP-7 to the PDP-11. This is why C has post- and pre-increment operators; *x++ and *--x maps 1:1 to addressing modes used in the PDP-11 instruction set (this is also why the 680x0 series microprocessors were such a good match for C too, and is one of the reasons why anyone who was anyone was using the 68K line for Unix workstations). --SamuelFalvo?


Discussion

Much rudeness has been removed from this page. Please don't add it back. Thanks.


This is an old argument. Good quote: "Why didn't Smalltalk become popular? It had square braces." Really, the sticking point is the "I can learn a new programming language in a day" clan. I used to be one of them, but now I think otherwise. If all the languages are ALGOL descendants, sure, I can learn them fast, but that's just learning how they differ from C/C++. "Sure I can remember everyone's name, as long as everyone is called Fred." -- SunirShah


C syntax is perhaps obtuse, but at least elegant. It maps well to the underlying hardware.

How can the syntax of a language map well to the underlying hardware? The semantics of C might map well to hardware (this is debatable, and arguably more true of a PDP-11 than, say, a Pentium 4), but the syntax has little or nothing to do with that.

So when you are concerned with the underlying hardware, C is a good choice. At the opposite end of the scale, languages like Lisp map well to the underlying realities of computation. [DeleteMe: this bugs me, it was originally one statement, but got refactored into two people, in a sense]

I was always under the impression that the first versions of LispLanguage had primitive functions that mapped almost directly to the underlying hardware. That's where we got stupid function names like CAR and CDR. LispLanguage would not be my choice of an example of a high-level language. -- IainLowe

Yes, but now they've been aliased to FIRST and REST, to pacify Iain. Please remember that Lisp is forty years old, and has evolved in that period. It is in fact a testament to its elegance and flexibility that it has evolved into the best high-level language out there from such ... well... primordial beginnings.

Indeed, the terms car and cdr are derived from the contents of the address register and the contents of the decrement register, respectively, which referred to features of the hardware architecture the primordial LISP interpreter was implemented on.

{But that's just bad naming, not necessarily bad language features. They could have called them "fel" and "roe" for "first element" and "rest of elements", for example.}

Source code with "fel" all over it may make management nervous :-)


If you don't like a language's syntax or even its semantics, that's simple to fix. See ObjectiveCee for a major example, the java preprocessor at http://virtualschool.edu/jwaa for a minor one. -- BradCox


I used a database called CornerStone back in the early 1980s that kept the names that you used for object such as tables and fields separate from their internal IDs so that if you changed a name all screens reports and view knew about it. A small thing, it made refactoring databases a lot easier. It also allowed you to define calculated fields and lookup fields in the tables' definitions, there was no distinction between tables and views. -- AndyMorris

What happened to Cornerstone? - sounds like they were a little ahead of their time. See CornerStone.


About affordable computer systems more powerful than the pdp-11 used to create C and Unix:

Holy crap! How much do you make? Where do you get those deals? Hook me up, man. :-)

I'm not sure which model of pdp-11 was used by the C and UNIX developers. Picking at random a typical model available at the time, we find that with 4kwords of 16 bit memory, a pdp-11-20 had a 1971 price of $10,800. Source is http://www.village.org/pdp11/faq.pages/pricing.pdp11-20-aug-1971.html.

Handheld: http://www.handspring.com/ - Visor Platinum, 33MHz Dragonball processor, 8 MB RAM, $249. One of these was recently given to me by some relatives who split the cost of it. (I didn't ask for it, they came up with the idea and the money and asked if I'd like to receive it. I'm enjoying it very much and would recommend it as a swell PDA.)

Hot-rod PC: http://arstechnica.com/guide/system/hotrod.html - The ArsTechnica recommended "hot rod" box, $1527.78. Athlon/1.4 GHz, 512 MB RAM, 30 GB hard drive, etc. I agree that's not the most hot-rodded computer available today, but it can compile the kernel in less time than most people like to have in a coffee break, and I think does qualify as orders-of-magnitude more powerful than the PDP-11-20. My personal jalopy computer is an antique compared to this thing.

Jobs that pay $250 a day or $1600/month are often listed on http://www.dice.com and http://www.monster.com.

If you're an expert at making programs that are super-efficient at runtime, using C without any help from BetterSyntacticSugar, then you certainly should be able to make enough to buy a computer that would pour the sugar. But you wouldn't need it! -- ChrisBaugh

Actually, as the apparently rather underpaid interjector here, I am an expert (well, I'm proficient anyway) at making programs that are superefficient at runtime in C and C++, for which I use magical syntax (but not syntactic sugar) and magical optimizing Perl generator scripts to create my C/C++ code for me. Only weak minds are trapped by their languages. The strong ones can invent their own. -- SunirShah

True. But some languages make that process natural, others, almost impossible.


Are people getting confused, or am I? Surely the argument is to dump C syntax, not C itself. Of course you can't dump C -- Unix is written in it! But that doesn't mean new languages have to use the same (or, to be more accurate similar looking syntax forever! I think the reasonable success of PythonLanguage shows that a new syntax is not necessarily the death of a language.


You can dump C syntax in C language:

#define begin {
#define end   }
#define then
etc.

Don't forget as, with, of, and, uses, implementation, program, library, function, procedure. Or you could just use a Pascal compiler in the first place.


It is very difficult to write a CodeWalker for C programs. That's mostly due to the following two features:

The sad fact is that these aspects of C syntax could be replaced by sane constructs with no lack of expressive power, and with only minimal changes to the syntax.

-- StephanHouben

This is preposterous. Syntaxes aren't context-free, grammars are. Grammars describe syntax. If you want to program in a context-free syntax, you're going to have to write everything in BNF. There are tons of CFGs for C, not the least of which is the one published in the ANSI C standard.

While obscuring the syntax with the grammar is an all-to-common mistake, the point above stands. One can certainly write a CFG for C, but there are numerous forms which look alike to the parser, and cannot be disambiguated strictly with a CFG. For example, consider the following.

 makePie (apple, cherry, orange);

If "apple", "cherry", and "orange" all refer to variables, then this is a function call. If they all refer to type names, then it's a function prototype. If there is a mix, or if any of apple, cherry, and orange aren't previously defined symbols, then this is an error. Or a macro invokation. I mention this because some C compiler (like Borland?) tightly integrate the C preprocessor into the lexer and/or the parser. -- SamuelFalvo?

Such symbol lookup is well above and beyond the capability of a CFG. Of course, for implementing a compiler that matters little - the semantic analyzer will have to run anyway, so deferring disambiguation of forms until then isn't a big problem.


Is this page lamenting excessive capability or lack of guidance on how to use it? I will admit C syntax will allow you to do silly things, but it also rarely blocks you from doing the things you need to do. It is also true that the preprocessor was used to patch deficiencies in the language and compiler technologies, but the former shortcomings are slowly evolving out of the language. In general, I prefer the less restrictive nature of C, but in return I must agree not to push the envelope. -- WayneMack


On the subject of choosing C for open-source projects, I'd imagine it would be for the same reason that the NetPbm? suite of programs was created, because (in the case of libraries) it is possible for any other language to use C libraries, but if you were to pick, for argument's sake, let's say, PerlLanguage, how would you go about calling a library written in that from, let's say, LispLanguage? If C were used instead to create the library, there exist for most languages automated systems for creating shims to allow access to C libraries.

On the other hand, for applications, I'd imagine it is because it's easier to attract the ArmyOfProgrammers that make projects work if you use a commonly known language.

-- MartinRudat, who likes his C syntax just fine... in the form of PerlLanguage, which doesn't suffer from all of C's semantics...

all of C's semantics Funny, I didn't know C had semantics.


As a long-time C / Java coder, I see two main disadvantages to C syntax. It's verbose (compared to Python) and it's difficult to express certain concepts (compared to Lisp). Are there any other disadvantages?

Just curious, but why do you state the C is verbose? The usual complaint I hear is that it is too terse?

  for(int index = 0;index<aList.length;index++){
    //do work to aList[index]
  }
is more verbose, and more error prone than something like

  aList.forEach(function(each){ 
    //do work to each
  });
or if not using forEach, still a better safer alternative:

  for i = low(x) to high(x) do whatever();
Rant: Another problem with the above C syntax you gave, in addition to it being verbose, is that it doesn't make sense, unless you calculate and convert it into english first. Too many symbols, too much time converting it all in your head (rather than getting on with programming.) Sure, a few things can be gotten away with as symbols (some math eqations and symbols), but to have a screen of symbols is not enjoyable. Try maintaining a long complicated Regex. They are quick and dirty at first, but then the symbols start getting to you.

Another problem with C syntax are the CASE/BREAK/SWITCH statements which are verbose compared to "default cases" as seen in even Visual Basic, Freepascal, and many other tools/languages. Example elegant case statements that don't read verbosely and unstructured like C "goto statement" cases:

  case something of
    35: dothat;
    1,2,3: dothis;
    50..100: intherange;
  end;

See IsBreakStatementArchaic for more on switch/case syntax.


I think at the time it was created, C provided some notable improvements, but there were some oversights as well (hindsight is always 20/20). Some things that appear to be disadvantages are:

Those complaints apply to C only. Other C-syntax languages, like Java, have resolved them.


A large rant on an issue which is completely and utterly arbitrary moved to ModifiersBeforeOrAfterNouns

Of course syntax is arbitrary since the computer doesn't care. It is about what *people* want. Some of us are more comfortable with certain conventions over others. I wonder if there are any groupings of preferences, or if everybody is a random mix of preferences.


We should all combine the best features of C, Python, Smalltalk, Lisp, Haskell, Forth, and so many others into one post-modern, systems programming language. We can call it PL/C. ;D -- SamuelFalvo?

That would be a good brainstorming point in a search for a LanguageOfTheFuture... but I've a feeling the language would go to hell if we 'all' tried to chip in. All the best languages have just one designer. Also, there'd be a great deal of conflict about what the 'best' features are.

[It's pretty much impossible to combine all the features and something has to be chosen... because EverythingSucks but WhatSucksLess?]

But we can maybe make a syntax "kind of like C" rather than a brand new kind of syntax. The Python and Ruby folks will not agree with the white-space issue, for example.


If one applies many of the suggestions above, it grows fairly close to Pascal (PascalLanguage) with curly-braces instead of BEGIN-END for blocking. Perhaps this is telling us something.

Pascal sucks.. because there are no standards for the real world modern implementations that are in use. I wish freepascal and borland would decide upon a NewPascal2010 standard or something similar. As for the BEGIN and END problem.. opinions on that is all over the internet how to fix it.

The basics are pretty standardized. And lack of some standardization by itself does not make it a useless guide.


Curly braces are alright, I'd say. Indentation would be better, though. Indentation parsed into parentheses would be best, forming a lisp-like language. { } could translate into argumentless lambda, thereby allowing user-defined constructs to form their own language. Heck, ScalaLanguage has something like that last one:

     // Scala
     something onError {
          LOG("Error occurred.")
     }
     // ... translates to:
     something.onError(LOG("Error occurred."));
     // ... translates to:
     something.onError(() => LOG("Error occurred."))
So does Perl6, with its blocks and pointy blocks. Ruby has a similar syntax; unfortunately, it doesn't use curly braces, so the syntax isn't consistent.
     aCollection.map {|f| f.factorial}
is fairly concise, though.


CategorySyntax, CategoryCee, CategoryAlgol, CategoryPascal, AlternativesToCeeSyntax, CategoryLanguageDesign, PhpProsAndCons (lessons from another language)


EditText of this page (last edited November 24, 2014) or FindPage with title or text search