Semi Colon

Most lines of most computer programs have exactly one statement. (Although some languages, such as Lisp, Scheme, and SQL, make identifying the beginning and end of statements fairly arbitrary. (although Lisp makes identifying beginning and end of expressions completely precise))

But often, it is nice to have to have multi-line commands, or multi-command lines. Different programming languages use different syntaxes to implement this feature. Some programmers prefer one way; some prefer another; some don't care; and some can switch back and forth, but find it temporarily awkward to switch styles.

Some approaches (some of these may be combined)

By default, each line has exactly one statement. Require the use of a line continuation character to have a multi-line statement.

Known uses:

VisualBasic: use an underscore (_) at the end of the line. Separate the underscore from the rest of the line using white space.
CeePreprocessorStatements, ToolCommandLanguage, BourneShell, GnuMake: these use a backslash (\) at the end of the line.
FortranLanguage : a '&' in one of the first few columns at the beginning of a line marks it as continuing from the previous line. (if remembered correctly)
RubyLanguage and EeLanguage improve on this idea -- see approach #7 for details.
XBase languages (dBase, FoxPro, Clipper, etc.) use a semicolon (;) as a continuation statement. Hmmm. The semicolon as a statement non-separator?
LogoLanguage often can detect whether a statement crosses multiple lines, but a terminal tilde (~) is used when it can't.

Resulting context:"It causes no end of trouble" in CeePreProcessor macros. Very long (multi-line) macros in the C preprocessor are a real pain. (Especially if there is whitespace between the backslash and the newline). Try to define a try-catch block in a macro -- with multiple catches. ;-)

That's not very hard.

 #define TRY    {      jmp_buf ex_context;      int ex_code;      switch (ex_code = setjmp(ex_context)) {        case 0:

 #define CATCH(N) break; case (N):
 #define CATCH_ALL break; default:
 #define FINALLY } }

By default, each line has exactly one statement. Require the use of a statement ending character to separate multiple statements in a line.

Known uses:

BasicLanguage, VisualBasic: colon (:)
BourneShell, ToolCommandLanguage, most UNIX/Linux shells: semicolon (;)

By default, a statement can have any number of lines. Require the use of a statement ending character, and/or scoping characters, to terminate a statement. Known uses:

LispLanguage, SchemeLanguage: All atoms are separated by whitespace; lists are enclosed in parentheses. Newlines are ignored completely; except as whitespace. See OpenParenthesesContinueLines. Multiple "statements" with side effects executed one after another as in imperative programming can be wrapped in a (begin ...) construct, which returns the result of the last expression. Many other constructs (e.g. lambda, define, cond, etc.) already implicitly emulate "begin" blocks.
CeeLanguage, CeePlusPlus, JavaLanguage, PhpLanguage:All statements are terminated with semicolons; compound statements terminated with the closing }. Null statements (; by itself) are OK.
RubyLanguage improves on this idea -- see approach #7 for details.
EmbeddedSql: "EXEC SQL <SQL_statement>;"

Resulting context:Missing/superfluous statement ending characters and missing scoping characters are common sources of syntax errors or hard-to-find semantic bugs. (All C/C++ programmers have run into the following at one point):

 if (condition);/* note erroneous semicolon */
do_something();  /* Always executed */

There's also a subtle distinction between ';' as a statement terminator (C/C++/Java) and as a statement separator (PASCAL). Modern implementations of Pascal-family languages, such as OberonLanguage, still follow the statement separator interpretation, but permits null statements; thus, the semicolon in Oberon can be treated as a statement terminator if you want to.

Smalltalk uses the period (".") as statement separator. The Smalltalk compiler(s) were relaxed during the eighties to allow, without complaint, unnecessary separators. A Smalltalk statement can have any number of lines, and any number of Smalltalk statements can be separated using the period.
ForthLanguage: no real concept of a statement outside of words themselves. (Exception: Some parsing words, such as S" for a string literal, require the terminating " to be on the same line.) Like LispLanguage, words are separated by whitespace. Word definitions start with ':' and end with ';' It is good style to format large word definitions as a paragraph and short word definitions as a single line. Forth has compiled control flow words such as IF, ELSE, THEN, WHILE, REPEAT, etc. One possible style is to put these words on separate lines and indent the phrases they contain; however, doing so also violates the "one to two lines per definition" rule of thumb too. It is also good style to group a related sequence of words into phrases, one per line.
- In Forth, the semi-colon terminates a word definition (cleans up locals, compiles a return, and switches state from compiling back to interpreting).
OcamlLanguage: Although one mostly programs in functional style, OCaml also uses the semicolon ';' as the separator in a group of imperative statements. The whole group of expressions returns the result of the last expression, i.e. the expression after the last semicolon. That is why the semicolon is not a terminator -- it cannot be placed at the end of the last statement if you want it to return anything. All expressions to the left of a semicolon should have type "unit" (i.e. not return anything), because it will be discarded (the compiler will warn you otherwise). Note that there is a common pitfall about precedence, because semicolons have lower precedence than if-then-else expressions, so if you want to execute multiple statements as the consequence of an if-then-else, you need to wrap it in a begin...end block, or parentheses (think, like braces in C).

By default, a scoping block is defined by language keywords. No special characters are used to abbreviate these keywords. (warning; how did we switch from statement delineation to scoping?)

Known uses:

BasicLanguage, VisualBasic: Sub ... End Sub, et cetera
OberonLanguage, ModulaTwo: IF ... THEN ... ELSIF ... ELSE ... END, PROCEDURE ... BEGIN ... END, etc

By default, a scoping block is defined by relative indentation.

PythonLanguage

Resulting context:It is hard to copy-and-paste such code from web browsers.

Debatable; I do it all the time, and have never had a problem.
How about snippits in email where various senders, forwarders, and receivers mangle whitespace in myriad ways?

By default, a scoping block is defined by explicit symbology.

Known uses:

CeeLanguage, CeePlusPlus: { ... ; ... ; }
OcamlLanguage: ( ... ; ... )

Some ways the IDE or compiler can intelligently figure out what the programmer meant, and reduce the need for these special characters:

Because print-outs are sometimes narrower than the code window, a single-line statement (in the code window) is sometimes printed out as a multiple-line statement (on the printout).

The IDE can print out a special symbol to indicate the wrapped line.

Known Uses:

VB.net: uses special arrows.
EmacsEditor

The statement terminator is optional when the end of a line looks like the end of a statement

(basically, if the last char on the line is an operator, or some quote or parens are open, it doesn't look like the end of a statement to the parser). This combines approaches #1 and #3.

Known uses:

RubyLanguage, EeLanguage, BourneShell(?)
ToolCommandLanguage
JavaScript

The Sql expressions and Sql clauses in a SQL query can be thought of as statements in a block. Because StructuredQueryLanguage requires its keywords to be in a certain order, clauses do not need clause ending characters.

Known uses:

StructuredQueryLanguage.

Resulting context:SqlLineCounts are fairly arbitrary.

A Sql query can be thought of as a long statement.

Known uses:

StructuredQueryLanguage.

Resulting context:It encourages big "run-on" sentences, and other SqlFlaws. The language does not encourage any particular indentation of the code within a query. Some SQL IDEs (such as SqlServer) do not preserve the programmer's formatting. Unformatted code is very hard to read.

Statements begin with certain keywords, and cases where those keywords occur in-statement do not cause ambiguities.

Known uses:

Some SQL stored procedure languages. (TransactSql, of Sybase and SqlServer, if I recall correctly.)
COBOL: You can put any number of statements before a period terminator. In COBOL-85 it's common to have lots of IF/ELSE/END-IF together, with one period at the end.

Statements are assumed to be one line. Continuation lines (2nd...Nth lines) are marked with a continuation character.

Known uses:

FORTRAN: Any character in the 7th column concatenates this line with the previous one. Fortran, really, is the classical example of how not to define a language's tokens or grammar.
COBOL has a column in which * indicates continuation lines and - indicates comments, IIRC (I read a COBOL book once to learn some history).
VimScript?: Continuation lines should be started with a backslash, possibly preceded with whitespace.

Discussion:

I don't like semicolons in computer languages. They are anti-WhatYouSeeIsWhatYouGet and a common source of syntax errors in my opinion. I realize that this is a personal preference, but I wonder what the mental steps are in some people that make them like semicolons. Are they just to make parsers simpler (if they do), or do they have some human interaction benefits? Should languages focus on being easier for compilers over human issues?

Semicolons are a human ease trade-off. By having them, you get newline back for your own purposes: You're free to have multi-line commands, or multi-command lines. Both of these are worth the semicolon to me.

And I'm puzzled by how to apply the notion of WYSIWYG to programming. Can you explain what you mean?

Line breaks are immediately visible to the eye. Semicolons are not. More on WYSIWYG below.

From above: Missing/superfluous statement ending characters and missing scoping characters are common sources of syntax errors or hard-to-find semantic bugs. (All C/C++ programmers have run into the following at one point)

I for one have been doing C/C++ for 30 years now and have never seen this issue arise -- MartySchrader

I think the people who are against semicolons never had the experience of using a PunchedCardLanguage? like FortranLanguage or CobolLanguage. I can clearly remember when I started using Pascal (TurboPascal 2.0 in 1984). Two years earlier I had been writing Fortran, and to have the ability to format my code as I saw fit, to have the whitespace be irrelevant to the semantics of the program, was nothing short of a revelation.

Instead, the defining experience for these people must have been the frustration of getting "semicolon expected" syntax errors from a compiler and thinking, "Duh, why can't it understand what I meant." But this, surely, is a brief stage in learning a language.

Semicolons are there for the human, to give the human control over where the statements begin and end, and how the code is formatted. They aren't for the compiler or to make the parser simpler; a language like PythonLanguage can be well-defined and parsed without any semicolons at all (and when I first saw Python my immediate reaction was that it was an enormous step backwards; it took some time and experience with it before I grudgingly came to accept it, and now I quite like it).

-- DavidConrad

I'm just wondering where the perception that a line break signals the end of a statement arises. How often (outside programming languages) is that the case? How often have you read a book in which one and only one statement appeared on each line? The text editor I'm using to draft this note is currently inserting a line break every 80 characters as I type, and the note is getting on to being six lines long - with five line breaks. All that will go out of the window once this page is rendered: YMMV.

The perception comes from mathematics, where expressions are generally placed one per line. Historically, mathematical notation is the underpinning of programming languages. It makes sense for programming language design to taken inspiration from and build upon mathematical languages, rather than from written or spoken languages. Both mathematics and programming languages are trying to express precise meanings with a very compact notation.

In some poetry (but not all) there is one statement per line, but in general the proposition that computer languages should resemble natural language is one that should be questioned.

Now paragraph breaks are another matter; but in many programming languages (particularly those where whitespace is considered insignificant) there is already a convention of using extra blank lines to separate "conceptually atomic" pieces of code. When you're reading, paragraph breaks may be more noticeable, but how often are you consciously aware of the line breaks (which in English at least have no semantic significance)?

I do not state a personal preference here. Mine may be obvious, but irrelevant.

Is it the idea of having an explicit separator character, or the particular punctuation character which is disliked? Does any compiler allow a different statement separator to be specified?

Smalltalk uses periods as statement terminators. This can be considered more "English-like" than using semicolons, as periods are used to end sentences in written English.

Pascal uses semicolons as statement separators and a period at the end of the program. This is a different way of seeing an "English-like" syntax: a program is a single sentence.

For the record, I find SmallTalk's use of periods just as annoying as semicolons... except that it's often less obvious when one is missing (in many fonts, the period get extremely close to the last letter, besides the point that '.' is smaller than ';'). Periods are good for the english language, where most uses of them are in elements which flow in paragraphs instead of statements.

Many programmers don't want whitespace characters to be significant to the compiler.

Some people consider extra whitespace characters to be "noise".

Some people consider syntactically significant characters to be by definition "signal", not "noise".

(These are just personal preferences -- if you don't like it, than you don't like it.)

Three ways of thinking of WYSIWYG for programming:

How close is the programming language to how the person writes down spoken language? Of course, different spoken languages have very different writing conventions, which make whether to use semi-colons or not seem trivial. And once you've picked your "standard", why would you want your program text to look like natural language? Isn't that the CobolFallacy? No, because COBOL's syntax is too limiting. Meanwhile, other languages like ForthLanguage, and even CeeLanguage, provide support for very English-like "executable prose," without suffering at all from the CobolFallacy.
How close is the programming language to the standard way of writing down mathematical instructions or proofs? Again, why would this be a figure of merit? Because it directly impacts a program's ability to be maintained years down the road, long after you've left the project (or company), by someone who may well be your junior in the programming language. If it is a figure of merit, then I guess HaskellLanguage, or one of its relatives, is the way to go... Actually, Smalltalk is substantially superior to Haskell when it comes to fulfilling this metric. Haskell code, due to its function application syntax, tends to look more like Lisp code. However, you can regain some prosish feel to the program by adopting the convention of using underscores in your function names to serve as placeholders for parameters (e.g., moveContents_intoBox_; compare with moveContents:intoBox: in Smalltalk).
I generally want my code to resemble pseudo-code as much as possible (within reason), and semicolons don't belong in pseudo-code IMO. You already have the visual line break. Semicolons are redundant, a violation of OnceAndOnlyOnce. We have two different things fighting to be the "statement divider".
- This is a terrible abuse of the concept of OnceAndOnlyOnce. That's about duplicated code, this is about explicit ends of statements. Of course you don't like semicolons in pseudo-code. You don't like them anywhere.
  - Whatever you call it, it's redundancy. We have 2 dividers fighting over territory.

Line breaks are easy to see and usually used anyhow in practice to make code easier to read, and thus make a "natural" divider. Semicolons are not easy to see (unless maybe if you have a fancy editor, but that is EditorDependency?).

This is maybe the misunderstanding here. A linebreak and SemiColon are equal characters (at least on Unix). In most languages they don't fight for territory. There is either the one or the other. There is no special meaning in a linebreak. One just tends to think there is some special meaning. BTW: the delimeter which I would consider most "natural" would be '.'.

What the author was talking about was the fact that the compiler treated the semicolon as a statement break, while the human programmer treated the newline as a statement break. Unless you just so happen to pair them up against each other all the time, WhatYouSeeIsNotWhatYouGet?.

I am surprised that nobody has mentioned how this is handled in HaskellLanguage. Without any fancy adornments, Haskell is whitespace sensitive:

main = do putStrLn $ "Hello world!" name <- readLine if name = "Bill Gates" then do putStrLn $ "Ewww!" else do putStrLn $ "OK, you're cool."

But, you are free to use braces and semicolons if you wish:

 main = do {
   putStrLn $ "Hello world!"; name <- readLine;
   putStrLn $ if name = "Bill Gates" then "Ewww!"; else "OK, you're cool.";
 }

You have your choice.

There's a subtlety here you don't mention: The entire body of a Haskell source file is a brace-enclosed semicolon-delimited set of definitions under the where clause of a module declaration, but most people write their toplevel declarations (at least) with automatic layout.

When the parser got to your end brace, it saw that it was on the first column and inserted a semicolon before it, assuming it to be the start of the next declaration. This happens to work because you're in a do block, but it's something to keep in mind. If you want whitespace-agnosticism, write the module declaration and braces yourself:

 module Main where {
  main = do {
   putStr "blah"
  }
 }

Some programmers want the "look" of the program (controlled via whitespace) to be recognized by the compiler without any extra characters, whereas others want explicit tokens so that they can make the program "look" however they want. Both sides believe that their way leads to clearer code and fewer errors.

I know which works best for me. You are not going to convince me that your preference works better on me because you are not me. I have used both approaches extensively, so it is not a matter of "getting used to". Similarly, I agree that your preferences probably make things work smoother for you. This is basically a psychology issue more so than a technical one.

I think that's a very good observation, and one that applies to a lot more issues of syntax. See, for example, the debate on ObjectOrientedProgramming about foo.bar() vs. bar(foo), or LotsofIrritatingSuperfluousParentheses vs. LotsofIrritatingSuperfluousParethesesSemicolonsCommasBracesAndBrackets?.

If your lines are often so long that wrapping is frequently needed, then perhaps ResponsibilityDrivenDesignConflictsWithYagni (bloated syntax) plays a role?

What if my lines are only rarely so long that wrapping is infrequently needed?

Semicolons aren't only there to allow long lines to wrap, they're also there to allow short, closely related lines to run together.

 x = 17; y = 23;

Some languages have a character that allows that, but does not otherwise use semi-colons. Some BASIC dialects used to use a colon for example to do that. But one could also use a function for such in some languages:

  assign(x, 17, y, 23, foo, "bar")

But, I found that I did not need it that often if it was available.

After using C and Perl for years, I have problems in any languages that doesn't allow a semicolon at the end of a line. It was particularly annoying in a language that does allow semicolons to separate statements within a line, as well as blank lines. However, the transition to Python was easy (except that I would forget to use colons at the end of a line).

There are several syntactic features which make a StatementSeparator? (whether it's a SemiColon, a newline, or whatever) necessary:

Concatenation as an operator. Smalltalk uses concatenation for selection of a method within an object (similar to . in C/C++/Java), C/C++/Java and many FunctionalProgrammingLanguages use concatenation for function call. Obviously, if concatenation is used for this purpose, it would be ambiguous to use it for statement separation. (Lisp, OTOH, uses EssExpressions for anything useful, so it can separate expressions with whitespace).
More than one type the following: postfix, infix, or prefix operators (assuming that operators themselves are distinct tokens that cannot be used as expressions in their own right). If you have the expression "a - b" and - can be both an infix operator (subtraction) and a prefix operator (negation), then if concatenation is used for statement separation, this expression could be parsed as either a single expression (a subtraction of b from a) or two expressions (a, followed by the negation of b).
- Not possible if concatenation implies separate statements. a - b would have to be treated as the quantity A, then a subtraction (e.g., x-a, for some value of x already computed elsewhere). Then, you evaluate b. However, a -b is perhaps different. In this case, you're evaluating a, then evaluating -b. Since concatenation implies statement separation, -b must be treated as a single token. It's attempt is to evaluate an expression named -b, not the quantity -b itself. Unless, of course, you've explicitly named it as such (e.g., : -b b negate ; in ForthLanguage).

Can't all the punctuation just get along?:

See: SyntacticallySignificantWhitespaceConsideredHarmful, LanguagePissingMatch, PythonWhiteSpaceDiscussion, SqlLineCount, HolyWar

CategoryRant, CategorySyntax