Ess Expressions

EssExpressions is the common term for symbolic expressions in the Lisp programming community. A symbolic expression (sexp) is simply one piece of data, like a number, string, symbol, or list of sexps.


Ron Rivest (of RSA fame) submitted an IETF draft for S-Expressions back in 1997 that defines a canonical serialized representation for data: http://people.csail.mit.edu/rivest/Sexp.txt (Quoted from the XmlSucks page.)

John McCarthy?'s paper on S-expressions as a Common Business Communication Language: http://www-formal.stanford.edu/jmc/cbcl.html

For non-Lispers who just want to know about S-expressions as data structures, it's best to read Rivest and McCarthy? first. Discussion of cons cells, car, cdr, proper and improper lists, et cetera, is Lisp-specific knowledge.


You can make cute little formalizations of sexps if you wish, but they're not that important.

Here's one little taxonomy, which you can see in CommonLisp. Keep in mind that in this taxonomy, sexps are divided into "atoms" and "cons cells":

Exotic stuff:


Q&A

Q: Why are they called "symbolic" expressions? After all, they're not just symbols.

A: It doesn't make that much sense. Presumably, it's in opposition to "numeric" expressions. It conveys a discrete nature of data, instead of a more continuous view. (Of course, languages whose code is represented as symbolic expressions may operate on "continuous" data, like what passes for real numbers on computers.)

Maybe it's that things like 42 or "this is a string" are really some kind of symbol, which we interpret in a special way. (Since sexps are just data, a sexp framework shouldn't really distinguish between sexps anyway; we do.) No doubt "symbol manipulation" has a specific meaning in logic or philosophy, where certain truths could be arrived at through mindless symbol shuffling.

The original plan of McCarthy? was to extend the FortranLanguage so that it could also easily perform symbolic computations (such as symbolic differentiation). Hence the name.

Q: So where do lists come in?

A: Well, it's the "collection" type that Lisp historically has stuffed other sexps in. You can imagine other collection types, like say objects from ObjectOrientedProgramming. And they may be "better"; let your imagination run free.


Notation

Atoms have a simple notation. Numbers look like 42, strings look like "blah", and symbols look like: symbol. (Symbols just look like anything, though if you want a symbol to contain whitespace characters, there's usually syntax like |this is a symbol|. But most people would write this-is-a-symbol, with no whitespace.)

The notation for lists and cons cells is a little more confusing. You don't have to memorize this, but here's what a single cons cell looks like, with the car "blah" and the cdr 42:

 ("blah" . 42)
Here's a list:
 ("blah" 42) ;identical to ("blah" . (42 . nil)) in cons notation
So cons cells and lists are notated with a very similar syntax, which differs only in a single ".". Now, you normally don't have to remember cons cell syntax; list syntax is far more common. But sometimes some jerk writes something like:
 (a b c . d) ; almost a ProperList, but ends with the final cdr being d, not nil
Cycles are created this way:
 #1=("blah" #1#)
That represents a ProperList, the second element of which is itself. It has length of 2, but there's something infinite about it. Not all sexp systems support these cycles, but CommonLisp does. Encountering the use of cycles is very rare, but some people apparently want to make graphs this way.


An atom is anything that we designate as primitive type. It may well be a byte-array (good for sending raw uninterpreted data), it may as well be date, times, time intervals or any other useful type that one designates as primitive for the purpose of convenience and balancing EconomyOfExpression versus EconomyOfImplementation?. Please note that at an extreme one can still represent any kind of data without primitive types of all, but that would be very inconvenient. Similarly, one can represent dates as structured data rather than primitive data.

Alternatively, and more economically, RonaldRivest proposed to do away with CONS and support LIST as a primitive constructor. Together with the operation to query the length of the list and retrive the n-th element. This would be a sacrilege against Lisp. Since I implemented EssExpressions in Java, and followed the LISP cons convention for 2 different reasons : 1, was deference to the tradition of LISP, and second, the ability to be somewhat compatible both syntactically and semantically with Lisp data. I can tell you I was mistaken, but I learnt to live with the mistake , while RonaldRivest was damn right. His proposal has better EconomyOfExpression and EconomyOfExecution.


S-expressions, or SymbolicExpressions, or "sexps", defined recursively as an atom, the empty list (represented as "()", or NIL in some LISP variants) or a dotted-pair of S-expressions (S1 . S2) (see ConsCells). A list (A B C) is simply an abbreviation for the nested dotted-pairs (A . (B . (C . ()))).

For example, a height-balanced binary tree with four leaf nodes (atoms) could be expressed as ((A . B) . (C . D)), though one is unlikely to see such simple structures in practice since there is no room left in the interior nodes for extra information.

Can anything other than atoms be included in SymbolicExpressions? For example, what if I wanted a structure like ((red red-description) (green green-description)), but where the descriptions were chunks of text which might include whitespace and parentheses?

Sure; "chunks of text" are atoms.

It can look like this: ((red "a color that's like your beautiful lips") (green "a color that's like your beautiful snot")). The Lisps that uses sexps usually use " as a string delimiter.

Anything that's not a cons is an atom.

"Cons?" What's that?

ConsCells are the the underlying nodes of a LISP list: a pair of pointers, one to the first element of a list (traditionally called 'CAR' for historical reasons), and the other to the next element (traditionally called 'CDR'). These elements may be either atoms (words and numeric literals) or other cons cells. The name 'cons cell' comes from the function 'cons', which is short for 'list constructor'. A simple 'proper' list consists of a series of cons cells, the car of each pointing to an element and the cdr pointing to the next cons cell of the list, with the last cons cell's cdr is set to NIL. A list in which the last cdr points to an atom rather than NIL is called an 'improper' list. In LispLanguage and many of its derivatives, ConsCells are represented in DottedPairNotation.

But note that (red red-description) is not the same as (red . red-description) - the former is equivalent to (red . (red-description . ()))


The syntax of many LispLanguage-descended ProgrammingLanguages is based on SymbolicExpressions. Since the syntax is very regular, a paren-aware editor can automate many tasks that arise when you are EditingLispCode, like indentation.

However, no editor can ameliorate the pain of trying to read deeply nested expressions on the printed page.

See ReadingLispCode.


This description seems to assume too much existing knowledge of LISP.

The first two paragraphs, i.e. the description, have no mention of lisp and no need of it, so I don't understand your comment.

What do the dots mean then?

Each "dotted pair" represents one element with two slots for things. A dotted pair containing the things "S1" and "S2" is written "(S1 . S2)". You can put these things inside each other. Now it's true that I knew that before coming to this page, but I also don't understand how it isn't obvious.

Another way of thinking about the (A . B) notation for cons cells is to consider it as a serialized representation of a binary tree, since cons cells are used to construct binary trees. A and B are the leaves of the tree here, ( represents the left branch, ) represents the right branch, and . represents the place where those branches join. So, in a typical two-dimensional representation of a binary tree, (A . B) would look like this:

   .
  / \ 
 A   B
In this way, you can see that a cons cell is just the fundamental unit of construction of a binary tree. All lisp code not containing cycles is just a bunch of binary trees. As explained elsewhere in this page, (A B C) is a notational shorthand for (A . (B . (C . ()))), which now can be understood two-dimensionally as:

   .
  / \ 
 A   .
    / \ 
   B   .
      / \ 
     C   empty
Dots are overloaded to mean lots of different things in different languages and notations.

Okay, now what are these "things" you talk about? And what do you do with them? I guess the whole S-expression business leaves me in the murk; I Just Don't Get It®.

A number is an S-expression:

    42
A string is an S-expression:
    "foo"
A Lisp symbol is an S-expression:
    bar
These S-expressions are all "atoms" (in the original Greek sense - you can't further subdivide them.)

Not really: you can extract individual characters from a string, etc.; but by the Lisp definition, anything that's not a cons is an atom.

:You can't subdivide them further without changing the value of the atom. Just as real life atoms can't be subdivided without changing the properties... what remains is no longer this atom.

Given any two S-expressions S1 and S2, you can construct a new "pair" object that holds both of S1 and S2, with the following syntax:

    (S1 . S2)
So for example:
    (42 . "foo")
    ("" . bar)
Pairs are also S-expressions, so you can nest them:
    (42 . ("foo" . (bar . "")))
Lisp languages frequently use lists as data structures, which have the following generic structure:
    (S1 . (S2 . (S3 . (S4 . ()))))
Note the "()" at the end; this the NIL S-expression, which means "nothing"; it indicates the end of the list. For convenience, Lisp languages display (and read) the above data structure as:
    (S1 S2 S3 S4)
The final NIL is implied by this representation. Note that the first nested structure is usually displayed and read as:
    (42 "foo" bar . "")
Pairs are sufficient for you to represent any list or tree data structure, although the programmer/application has to keep track of the representation. Using pair-manipulation functions, you can represent directed graphs (with cycles or without). For an example of a specific tree representation, see OlegKiselyov's SXML specification, which provides a different printed representation of XML data:

http://pobox.com/~oleg/ftp/Scheme/xml.html#SXML-spec


LispFamily languages with more conventional syntaces do exist - LogoLanguage and DylanLanguage being two well known examples - but none of them have supplanted the original SymbolicExpressions LISPs. The question a lot of programmers have is, "why have SymbolicExpressions lasted so long?". A few possible answers:

Simplicity - SymbolicExpressions are simply lists, with the function name being just another list element. If no other syntax is needed, why add it?

Self Representation - The external representation of SymbolicExpressions as lists is also the internal representation of the programs, and programs can be manipulated as list (IIRC, tcl and SNOBOL have the same property, vis a vis strings). Languages that exhibit this property are said to be homoiconic.

Consistency - In most languages, the syntax for the common language structures is different from the function-calling syntax; with SymbolicExpressions, the two are the same.

Extensibility - Because of the properties listed above, it is easy to extend LispLanguage and its relatives in a transparent manner; new constructs are for the most part on equal standing with core language constructs.

Inertia - a lot of code has been written in SymbolicExpressions, and even with a very flexible language like LispLanguage, dealing with two different syntaxes can be confusing.

Elitism - Because it is so different from conventional languages, it is harder for programmers coming from a CeeLanguage or WirthLanguages background to learn, which makes the LispWeenies feel superior. (Whether it is true or not, this is a common accusation.)

Aesthetics - S-expressions are remarkably easy to indent semi-automatically or automatically, and they make it fluent to read very complex expressions that span multiple lines.

Does anyone want to add to this list?

The next question then becomes, how do you design an EmExpressions syntax that has the advantages of SymbolicExpressions, but none of the disadvantages? Whatever you came up with would also have to be different from the Ada-style expressions that "pop" languages use, which is the biggest disadvantage of SymbolicExpressions .

Perhaps part of what we need is a general discussion about HomoiconicLanguages, and specifically HomoiconicityForConventionalSyntax?.


Discussion of moving the text above to SymbolicExpressions:

I'd like to change the links to EssExpressions to point to SymbolicExpressions.

Why? EssExpressions is the term most commonly used in the programming community. Almost nobody says "symbolic expressions", except to answer the question "what does S-expression stand for?"

In addition, the term SymbolicExpressions is almost generic - it can mean any expression rendered with symbols. EssExpressions is precise; it refers to the particular structure (if not syntax) used in LispLanguage and in other places.

[I agree that S-expression is the standard and that SymbolicExpressions can mean many things. Just as a btw, on the other hand it's also very common to see "SEXPRS", and with some renegades, "SEXes". And isn't it odd that, although SEXPR contrasts with FEXPR (and SUBR and FSUBR), on the other hand S-expression seems to include all of those. :-) ]

[I imagine the point was that "EssExpressions" looks very weird even to Lispers, and must be completely unintelligible to non-Lispers. But yeah, SymbolicExpression? isn't really the best alternative.]

[How about LispSexprs?? CamelCase can be fairly limiting sometimes.]


Yes, I'm just trying to eliminate an UgLy name. I don't really care for "LispSexprs" any more than I did EssExpressions, though.


Add one vote for EssExpressions. -- DanielBrockman


People who like the capabilities of s-expressions (e.g., that they are homoiconic and general), but wish they were easier to read (e.g., more familiar-looking), should take a look at the "readable" project (http://readable.sourceforge.net), which has created 3 tiers of extensions to s-expressions. The first is curly-infix-expressions; these add which infix notation by interpreting {...} as a list that is written in a different order. The second tier is neoteric-expressions, which adds support for f(x) as a synonym for (f x). The third tier, sweet-expressions, adds semantically-relevant indentation. Since these are simply extensions of existing s-expressions, they are backwards-compatible with traditionally-formatted s-expressions. Scheme and Common Lisp implementations are available. Please take a look and/or join if you're interested.


Contrast: EmExpressions See: DottedPairNotation

CategoryLisp CategoryLanguageFeature (?)


EditText of this page (last edited September 26, 2013) or FindPage with title or text search