Xml Isa Poor Copy Of Ess Expressions

Moved from XmlSucks since that page was getting awfully large.

"Think of XML as Lisp for COBOL programmers." (Tony-A on slashdot)

"S-expressions are a representation, XML is a career path." -- WardCunningham (oopsla '03)


OK, to summarize the discussion below, as I understood it:

Pro: More logic on the parser level possible Con: even low level concepts (like a number) are controversial: what, exactly, is a number? Pro: less typing, higher readability (for some use(r)s) Con: less error detection/correction, lesser readability (for some use(r)s) Pro: Compare Lisp to Ant or XSLT Con: How would XHTML look like in SExpr (some big text with "))))))" at the end?)

To me it looks like (after filtering as much flame as possible) XMLs semantics are good for uses as in XHTML, that is annotated text (where text >> markup), and less so in code or database-like data (where markup >> text), as seen from a "human having to actually type this stuff" point of view. The "everything is a string" mentality of XML seems to be the only common denominator everyone can agree on.


The ExtensibleMarkupLanguage is a poor copy of EssExpressions.

XML made the worst part of sexprs worse, by introducing lousy syntax. Then it failed to really capture the strengths of sexprs. Less for more cost, what a deal.

See http://xmlsucks.org/ or http://xmlsucks.org/but_you_have_to_use_it_anyway/


I don't understand what the controversy is here. XML is a (1) more verbose and (2) somewhat feature-redundant (consider attributes versus elements) representation of the same kind of thing S-expressions represent (namely trees). Due to #1, many, especially those less technically inclined, find it a more readable representation. Due to both #1 and #2, computers, and parser-writers, find it more difficult to parse, although not insurmountably so. HorsesForCourses.


Explain what is "lousy" about the syntax. And anyway, most of the syntax you create yourself when defining the keys. XML is not about string expressions; it is about storing data and then recovering it later on. What have S-expressions and XML got in common in this regard? Is there a reason to use one or another for storage and transport?

See also CommonBusinessCommunicationLanguage.


XML is not about string expressions; it is about storing data and then recovering it later on.

I beg to differ. First, a lot of XML is - and, indeed, is meant to be - edited by hand. Second, a lot of XML is not just a poor man's database. Just look at XHTML and XSLT (ExtensibleStylesheetLanguageTransformation?). One is for marking up text documents, and the other is essentially a programming language! Clearly, XML, like S-expressions, is not just about "storing data and recovering it later on." In fact, XML/XSLT is basically isomorphic to S-expressions/Lisp, although the latter is vastly more powerful than the former (of course, that's not to say that it's always superior, and I don't believe it is). -- DanielBrockman

"Meant to be edited by hand," indeed? According to whom? Have you data to back that statement up? And by the way, XML isn't a "poor man's database" in any case. It is a data transport, not a long-term storage or processing medium. The W3C says that XML's primary use is for "sharing structured information," not as a means of storing data for later database lookup. It's a means of storing data for transfer, then looking it up when you move it to another application.


XML describes hierarchically structured text or data, i.e., trees. To handle tree structures you need ContextFreeGrammar?s. All our search procedures are record or string oriented and based on RegularExpressions, which are RegularGrammar?s. Regexps do NOT work on trees.

Note: The exact same can be said about S-expressions.

Regexps can be augmented to work on structures, e.g. (foo (bar (:star (:type integer)))) could mean match the trees (foo (bar)) (foo (bar 1)) (foo (bar 1 2)). The query specifies a mostly-constant tree whose constant bits must be matched verbatim, plus some regexp-like notation that can be used at any level. (:star x) means match any number of xs, and (:type integer) means match an element whose type is integer.

This is a fundamental issue and not a problem of botched implementations or overly complex standards. So until someone comes up with a simple way to code CF grammars for searching and manipulating XML trees, XML will be horrible. Still, you cannot shoehorn a tree-based structure into an record-oriented database. So XML is still unavoidable.

Nonsense; using some kind of notation to write trees is unavoidable. XML is a bad choice for doing that and entirely avoidable.

XML is bad because regexp doesn't work on it? Isn't that regexp's fault? (besides, regexp will work in a lot of situations, if the DTD is designed with flat text searches in mind, but this seems like complaining that screws suck because they don't work with one's hammer; regexp isn't intended for markup.) Maybe use it on rendered output?


[NOTE: useless flame-bait removed]

Thesis: S-expressions are technically superior to XML

Lots of people have posted on this page. What about the non-midlevel XML programmer who is frustrated by the technological shortcomings of XML, coupled with marketeering? What about the programmer who is frustrated because some of these design mistakes are decades old and only being repeated in XML? XML could never live up to some of its more ridiculous hype (but then again, what can?), but it could have been much better than it is. If enough of the industry gets behind it (and it looks that way now), the political benefits of a common format (hell, nearly any common format) could outweigh the technical issues. That isn't a sure thing, but there is hope.

Antithesis: S-expressions are simply different from XML, not superior


"Sexpr's are technically superior to XML"

Can we have some point by point comparison to back up this assertion, please? Oh, by the way - what the hell is an sexpr, anyway?

S-Expressions, for the terms of this discussion, are merely lists of strings, such as (html (title "My Web Page") (body "Hello world!")). They take up less space and are easier to parse - you can measure it. However, technical superiority comes from the fact that XML attributes are redundant and can only contain text, not elements. Apparently it was done to make the SGML people happy, but it has been expressed by some of the creators of XML that it is a problem.

Okay. So, we have

as technical advantages of S-expressions over XML. However, XML isn't about compactness, and why are S-expressions "easier" to parse than XML elements? Particularly since XML elements are parsed in data context so that you know what an element represents at the time it is parsed. Also, XML uses DTD and schema to insure data integrity. S-expressions don't have a built-in mechanism for doing this, eh?

Of course - even more so than XML. XML Schema is not build into XML, but was added later on. A comparison of XML plus all the stuff layered on top of it to just s-expressions is like comparing apples and oranges. DTD and schemas don't do anything, it's the programs that interpret DTDs and schemas in order to validate XML. Taking that into account, you should rather compare XML plus all the heterogenous stuff around it to Lisp. In Lisp, it's really trivial to write validators that check anything you like. As another example, here is a Lisp function:

 (defun f (x) (* x x))
When I type in (f 5), Lisp returns 25 as a result. When I type in (f 5 6), Lisp says that I have passed too many parameters. The Lisp compiler has just validated the correctness of an s-expression against the restrictions imposed by another s-expression. I can program this on my own:
 (defun validate (sexpr)
(when (eq (first sexpr) 'f)
(when (= (length sexpr) 2)
(return-from validate t)))
(error "not ok"))
Now I can call (validate '(f 5)) which returns t, and (validate '(f 5 6)) which raises an error.

To repeat this again: There is no difference between processing data and processing programs!

As far as XML attributes being technically inferior because they can only contain text and not elements - say, what?!? Attributes are just that. They shouldn't be elements. They should describe the value of exactly one facet of their element, and that in context. Attributes may be considered to be redundant, but so what? There are a million ways to design XML data stores and elements. Attributes are simply one tool in the toolbox.


You want examples? Okay, as follows:

XML:

 <foo>
<integer>3</integer>
<string>abc</string>
 </foo>
S-expression:
 (3 "abc")
That's lame, One could just : <expr>3 "abc"</expr>

One could, but one would lose structure and possibly lose data. So the counterargument is suspected of being lame.

Too bad (3 "abc") doesn't have any structure either

It's a list containing a number in the first position and a string in the second position. That's structure for you.

Then <expr>3 "abc"</expr> does have an structure too.

Yes, it has a different structure. <expr>3 "abc"</expr> is a single element, expr, whose contents are the string "3 \"abc\". What s-expr people are saying is that s-expr have a better structure for saying more complicated things. <expr>3 "abc"</expr> is NOT the same structure as (3 "abc"). It's actually the same structure as (expr ("3 \"abc\"")): a list, named expr, with only a single element. And once again it's still longer.

How about <foo v="3" s="abc"/>, that is terser. Also, if all you are transmitting, every single time, is an integer and a string in that order, then XML might be too much. In the S-exp example, suppose 3 is really a string and abc is an integer expressed in hex, or a constant. The parsing program has to understand that in any case.

Not only is the S-expression smaller, easier to write and easier to parse, it did not require me to invent the name FOO (that's "foo" - XML is case-sensitive, unfortunately) just to create an enclosure for two items. Moreover, the tokens 3 and "abc" actually ARE an integer and string respectively. In the XML, the <integer> and <string> elements are just pieces of text. The software which reads this XML has to understand the convention, and then do additional lexical analysis to turn the text "3" into the number 3, or the text "abc" into the string object "abc". Any of these steps could introduce bugs: what if the originating program generates some lexical convention that the receiving program doesn't understand? A program written in C might understand 0xFF to be an integer; a Lisp program would understand #xFF.

Wrong-o. The text representation of the element's value is in a format that any application can read as text, then process into the native value. However, in your example there is nothing to tell me that 3 is the integer property of foo, or that abc is its string property.

The lexical syntax of the printed notation tells me that 3 is an integer object, because it's a sequence of decimal digits. Have you never used a programming language with a grammar that defined 3 as a token which represents an integer constant? Similarly, the double quotes indicate a string literal.

Why does the character '3' have to be an integer property of the foo element in the S-expression? Why does "abc" have to be the string property? As far as I know, '3' might be a name and "abc" might be an enumerated value from a list. The XML DTD/schema will tell the parser what types these properties are and what their the legal values are.

No, "abc" has to be a string, because it has no choice; the lexical syntax says that it's a string: it's a token that begins and ends with a double quote character. If there are further constraints on the values of that string, that's up to the software to validate. We can easily write an S-expression which encodes type and value constraint information about another S-expression. Those constraints will be applied over data which is already richly annotated with type information; we can validate that the third element of some list is an integer, and if it passes that test, we can then validate that it satisfies some property that integers can have. If you want 3 to be a name, then encode it as a string using "3" or maybe as a symbol using |3|. Or else, design the data structure such that strings, symbols *and* integers can serve as names. Then, the notations #xFF and 255 represent the same name, because they are equal integers. Try that with your DTD.

[Actually, that's not strictly speaking true, for sexprs. It may be true in a programming language or some other environment that has evaluation rules for symbols (such as Lisp). Could you be more specific, I see nothing dependent on evaluation.]

Anyway, verbosity is not the point as we'll see.

The XML proponent is thinking in terms of Unicode text and pure syntax/structure. Obviously XML element content has no type other than be a sequence of Unicode - a program might cast the Unicode to a type post-parse, but's that not XML, that's a program. The Sexpr proponent is thinking in terms of a machine that evaluates a symbol (perhaps to some type or other, but it could just as easily be a function call). Obviously sexpr symbols evaluate to something, but that's not sexprs, that's an evaluating environment. A sexpr file on its own is just as static as the XML one. An application program taking in the output of an XML parse content is just as dynamic as a sexpr evaluator.

So to some degree you're talking past each other from where I'm standing. By the way, this matter is strictly orthogonal to verbosity, and is something of a permathread in XML circles. Markup and programmer types have been going at each on this one since SGML. -- BillDehora

"More likely the sexpr would be:" (tuple (int 3) (str abc)) To push it further (using python): class x(object):

def ameth(x,y):
print y,x
y=7
x={1:"a","b":2}
In sexpr: (class x (bases (object)) (func x y (codestr print y,x)) (y int 7) (dict x (((int 1) (str a)) ((str b) (int b)))))


A sexpr file on its own is just as static as the XML one.

This is simply not true. S-expressions have embedded and built-in type information. I'll say that again, and please take the time to let this sink in, as it is the cause of much dispute around here: S-expressions have embedded and built-in type information. In an S-expression, a number is fundamentally and basically different from a string, and, mind you, we're talking about the lowest of syntax levels here - not about what some external validation mechanism says.

True, it's all text in the end, but you're saying that just because it's all text in the end, S-expressions have as little type information as XML. This is tantamount to saying that because it's all ones and zeros in the end, a Perl source file is as utterly devoid of meaning as a blob of random data. I mean, some people may insist, but that's just not the way it is: S-expressions have strings and numbers (and often other things); XML has only strings. And by God let's not even get into Perl.

-- DanielBrockman


The idea is that the XML representation of the data has a name for a field and can have DTD/schema mechanisms to insure that only the correct values are placed in those fields. Here's another example:

  <FooBar>
<OneString>
Bloofta
</OneString>
<AnotherString>
Ekmotz
</AnotherString>
<YetAnotherString>
Oonyoffs
</YetAnotherString>
  </FooBar>
How do you represent this collection in S-x so that "Bloofta" is associated with OneString, "Ekmotz" with AnotherString, and "Oonyoffs" with YetAnotherString? Is there a common method to do this?
Alternative interpretation to discussion below; the markup above describes a simple alist,

 ((oneString "Bloofta")
  (anotherString "Ekmotz")
  (yetAnotherString "Oonyoffs"))

Easy to read, easy to parse, easy to understand.

In Common Lisp one (c/w)ould use a structure:

  #S(FOOBAR ONE-STRING "Bloofta" ANOTHER-STRING "Ekmotz" YET-ANOTHER-STRING "Oonyoffs")
No, there isn't. Firstly, the software can apply some programmed input validation. Given the property list (:onestring bloofta :anotherstring ekmotz :yetanotherstring: oonyoffs) in a variable called PLIST we could write validation code in Lisp like:
(unless (member (getf plist :onestring) '(bloofta krufta mleep))
(error ":onestring property missing or has an incorrect value"))
It's not hard to come up with some language to specify constraints on a structure to automate checks like this; a much richer constraint language too, and one that is itself expressed as an S-expression. For example the expression (integer integer string) could be taken to match a list like (3 4 "foo") but reject (foo bar) or (1 2 3). Extending this mini-language, we could support expressions that specify value restrictions, not just type restrictions: ((integer :range (0 14) :satisfies evenp) (symbol :values (bloofta mleep)) and so on. The expression (integer :range (0 14) :satisfies evenp) would mean an integer in the range 0 to 14 that is even. This sort of thing is so easy to slap together in Lisp that nobody has bothered to standardize a notation for doing it that I'm aware of; there is no DTD-like-language for validating S-expressions. But there could be if there was demand.

Now, how about this:

  (4/3 #(1 2 3) #2A((1 0 0) (0 1 0) (0 0 1)) #1=(a b c #1#))
Show me the equivalent XML! 4/3 is a rational number. #(1 2 3) is a vector of 3 integers. The #2A... object is a two-dimensional array, representing a 3x3 identity matrix. The #1... is a cyclic structure; a list of four elements, the last of which is that list itself.

No problem:

  <brain-dead>
<rational numerator="4" denominator="3" />
<vector>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
</vector>
<array dimensions="2">
<array>
<integer>1</integer>
<integer>0</integer>
<integer>0</integer>
</array>
<array>
<integer>0</integer>
<integer>1</integer>
<integer>0</integer>
</array>
<array>
<integer>0</integer>
<integer>0</integer>
<integer>1</integer>
</array>
</array>
<list id="some-list">
<string>a</string>
<string>b</string>
<string>c</string>
<link ref="some-list" />
</list>
  </brain-dead>
Long-winded? Certainly. Harder to read? I think it's easier. Harder to write? Doubtful, considering most current editors support handy features such as "copy" and "paste". XML-aware editors make it easier still, by introducing such commands as "close element", "go to parent", "go to next sibling", etc.

In XML there would be elements with names that make some sense to somebody reading them. These elements may have attributes that further enhance the understanding and clarity of data transmission inherent to an XML representation. Additionally, there would be a DTD/schema to eliminate errors before they ever got into the processing guts of your system.

This discussion thread is a joke. An XML fragment is being compared to a Common Lisp program and, unsurprisingly, XML is shown to have fewer features. In its basic form an s-expression is atoms and conses. Simple, as an exchange format should be. Common Lisp the language? Not quite so simple.

A data exchange format supporting rational numbers, matrices and 17 kinds of reader syntax is a complicated format with little chance of success.

-- AndersMunch

[It's harder for me to read. So much space is used to describe the format (integer, string, array) that the content is hidden. There are two "integers" for every integer!]

Sure. Then let's fall back on the same argument that the Lispers use when confronted with the problem of being Lost In a Sea of Parenthesis -- use the right editor. A proper XML editor will show the data as a tree, with elements expanded or collapsed at will, the attributes in a list, and the values of everything immediately available. C'mon, folks. Let's stop with the StrawMan arguments, eh?


Show me the XML representation of a triangle as an object with three properties representing the segment lengths. Then show me the DTD which asserts that these lengths are all numeric data, and that no length is equal to greater than the sum of the other two, and that no length is zero or negative. Can I write a program that can entirely trust the XML layers to not give it anything other than a valid triangle?

The schema can insure that the legs are not zero or negative. However, the deal about the three sides being valid is kind of a RedHerring because, if you are really worried about the data being accurate, you only specify two sides [and the angle, heh] and let the application close the triangle. I guess you'd have to set reasonable limits on the lengths of the sides that the validator could check.

[A DTD defines legal document structure and does not validate data contained in the document. It is up to the program to validate the data, be it user input or an XML file.]

The DTD/schema can set bounds for the values of entities. This is as close to checking for accuracy as you are going to get. Remember, XML is about storing and transporting data, not about validating the data in its application context. You are correct there.

In other words, whenever the constraints are more complex than independently checking the elements of a tuple against some bounds, the users should find a transformation which reduces their object to such a tuple? Just because the DTD software is too piss-weak to handle the original representation that they want? Well, what if I don't want to? I may have good reasons for wanting to keep the triangles as they are.

You can keep any data in any format you choose. It's up to you to pick a representation of the data that can be validated via DTD/schema. After that your application needs to know what to do with this valid data. This implies some sort of massaging of the data before it is passed in and out of XML, but so what? You would be doing the same thing using any other transport layer, too.

To really enable arbitrary complex validation would require a Turing-complete schema language. You get this for free with S-expressions and Lisp since in the end everything is just one big harmonious lump of lov - err.. Lisp. There is no direct equivalent of Lisp in the XML world, but if a very powerful schema language were to pop up, I imagine it would look a lot like XSLT.

Hell, I don't even know if it'd be worth creating a new language, since you could basically use XSLT as a schema language, having a successful transformation indicating validation success and a transformation terminated by a fatal error message indicating validation failure. The output tree would be discarded in this case. Come to think of it, this is not unlike how I (and probably a lot of other people) currently use XSLT; I make sure that the input is sane before transforming it; but indeed it would be nicer to keep the two steps separate. Unfortunately, XSLT has no virtual functions, so you can't just pull the validation up into a common base stylesheet. Anyway, that's a different story.

To answer the original question, here's what that XSLT could look like assuming basic validation was done on a lower level (DTD/schema):

 <xsl:template match="triangle">
<xsl:if test="side[1] + side[2] &gt; side[3] or
 side[1] + side[3] &gt; side[2] or
 side[3] + side[2] &gt; side[1]">
<xsl:message terminate="yes">
<xsl:text>Those sides do not make a triangle:</xsl:text>
<xsl:text>one side is longer than the other two together.</xsl:text>
</xsl:message>
</xsl:if>
 </xsl:template>
-- DanielBrockman


Here is an interesting quote on the XML vs. S-expr issue by TimBray? (one of the XML originators).

As for S-Expressions, I can see the arguments, and can't honestly tell you why the same technologists who ignored decades of S-Expression lore instantly took up XML. It's crystal-clear that you could have used S-Expression syntax for XML and it all would have worked about as well.

Maybe it's because S-Expressions were too closely identified with the tattered dreams of the AI community (see AiWinter)? Or maybe just because XML's compulsory end-tags make it a little easier to read?

Whether or not you (as a SmugLispWeenie?) agree that it would have been "about as well" or better/worse, it is an interesting quote considering the source. See http://www.tbray.org/ongoing/When/200x/2003/03/24/XMLisOK for more of Bray's essay where the quote comes from.


As for S-Expressions, I can see the arguments, and can't honestly tell you why the same technologists who ignored decades of S-Expression lore instantly took up XML.

I think it has to do with the view of the application stack. Lispers are saying use Lisp for everything. XML says there is a clear division between the program and data so that any program can be used. I can use XML from my Java, C++, etc., and it will work. No assumption has been made about what language I should use. I like that. The Lispers are saying build all your semantics into the data using Lisp. Then of course you would program in Lisp, because half your program is in Lisp already, so it wouldn't make sense to use anything else. I don't like this lock-in. Yes I am stupid for not recognizing that Lisp is god, but I do a lot of dumb things.

See also PrincipleOfLeastPower


Lispers are saying use Lisp for everything.

This is not what we say when we say that sexps would be better than XML for data exchange. You could use sexps from any language. They are easy to parse. The main point I see is that sexps have semantics while XML 1.0, doesn't. There has been claims that XML actually has semantics. But to make that stick, you will have use additional specifications (infoset, xsd?) and not just XML 1.0.

If the semantics are provided by embedded code then you must have a way to get those semantics in your hosting environment. If the environment is Lisp then it is easy. But if its C++ how to transfer the Lisp code into C++ in such a way to preserve the semantics? It's not worth the effort unless you use Lisp in the first place.

The semantics are abstract concepts like "number", "string", and "symbol" (just a restricted string). In fact, that's probably about as much semantics as you're going to want to have in a general data exchange format, with "list" and "tree" already in place. I guess it'd be practical if the numbers had a definite precision, such as defining them to be IEEE floats. That is, if arbitrary precision is too much to ask (it probably is). But other than that I see no problems.


See http://www.prescod.net/xml/sexprs.html for the counter-argument "XML is not S-Expressions".


Just wondering... what does the 'S' in S-Expression stand for?

Symbolic. See EssExpressions


XML carries human-readable context information. I find this very valuable. The context adds nothing to the computer's understanding, and seems like waste, but it's good for us poor humans. A real life example I saw recently:

 <price-request>
<weight>3.9</weight>
<shipping-method>ground</shipping-method>
 </price-request>
I look at this and I instantly understand what's going on. I may not know everything (for example, is the weight in pounds or kilograms?), but I understand that the response to this XML request (and I know it's a request; it says so) will probably be some sort of price. I even understand that the application is involved with shipping in some way.

Now, take a look at this S-Exp:

 (price-request
(weight 3.9)
(shipping-method "ground"))
except the S-Exp looks better, is easier to read, and has no duplication, and is easier to parse.

Note: See SXML at http://okmij.org/ftp/Scheme/xml.html for Scheme programs to represent XML in s-expressions.

Why can't you just talk about S-Expression purely without any kind of LISP things? That's making me feel why XML may be better, It is not tied to any language


This is slightly tangential...

Several SchemeLanguage implementations allow one to read in XML and have it transformed into EssExpressions thusly:

  <html><head><title>SGML Sucks!</title></head></html>
becomes
  (html (head (title "SGML Sucks!")))
so you can treat XML data like any other code/data. Nice. It has occurred to me that these two representations of the same data favour different users. The XML representation favours text authors - you just type straight text, but when you want to insert markup (read: programming language constructs) you have to go through the hoopla of inserting those angle brackets, closing them, etc. The S-expression representation, on the other hand, favours the programmer - code is simple but text requires quotes. A nice symmetry that solves both problems.


I wonder... suppose I have XML like

 <foo bar="baz" quux="wibble">
<foochild bar="wibble" baz="quux" />
 </foo>
Suppose I want to render this as an sexpr. Based on examples from the rest of the page, I might write:
 (foo (bar baz) (quux wibble) (foochild (bar wibble) (baz quux)))
But now haven't I lost the semantic distinction between bar and quux being (named) attributes of foo vs. foochild being its contents? Everything becomes a tree, but I can't decorate the nodes, just make more children...

Keep in mind that there's a reason why some people say that the distinction is redundant. When you're writing an object oriented program, do you have a similar distinction between object attributes and object members in Python, Java, C, C++, etc? No, even though you use those things for stuff a lot more complex than XML. The distinction between attributes and members isnt fun to represent because it's really artificial. They're practically shoehorned into XML just as a convenient way for writing simple elements that only exist once.

Naw. You can write the sexpr like this:

 (foo :bar "baz" :quux "wibble" (foochild :bar "wibble" :baz "quux"))
{I personally find equal signs more natural.}

[Or, using a more XMLish notation, like the following.]

 (foo (@bar "baz") (@quux "wibble") (foochild (@bar "wibble) (@baz "quux)))

That's ABNF not XML


Although I also think that most applications of XML would be better off using s-exprs, the idea that s-exprs have "semantics" and XML does not is completely ridiculous. S-exprs only have semantics when interpreted. The author of that statement is confusing the (non-existent) semantics of s-exprs with the semantics of a language whose syntax is defined in terms of s-exprs. The two are orthogonal.

If you write an interpreter that interprets XML infosets, then you have not endowed XML with semantics, but have defined a language (with semantics) that uses XML as its syntax. Similarly, the LISP family of languages define different languages that all use s-exprs for their syntax, and all have different semantics.

I do not mean to disagree with you - indeed, yours is definitely among the more sane comments made about this subject here - but I want to point out that while S-expressions certainly do not have the kind of built-in artificial intelligence that some people seem to believe, they do at least have enough syntax to differentiate between an integer and a string. XML cannot do this by itself. Compare

 (42 "foo bar")
to
 <integer>42</integer>
 <string>foo bar</string>
The integer/string semantics are built into the syntax of the S-expression, while the XML processor only sees "elements" and "content", requiring external conventions for recognizing, e.g., integers. That is, unless you mean pristine S-expressions of (approximately) the form
 expr ::= atom | '(' ')' | '(' expr ws { expr ws } ')'
 atom ::= <not parens or whitespace> { atom }
 ws::= <whitespace> { ws }
True, those have absolutely no semantics besides the tree structure, but then again they are not what people use; people use beefed up S-expressions with semantic sugar.

Sure, but again the syntax and types of atoms that have special semantics are defined by the language interpreting the s-exprs, not by the s-expr syntax itself. Scheme has a whole range of numeric types, from integers through rationals to complex numbers, and supports all of those types as literals in its s-expr syntax. However, other LISP implementations do not have a literal syntax for rational or complex numbers.

But you're missing the point: Every one of those syntaxes is able to distinguish between a string, some kind of number, and a symbol. The atom "foo bar" is always a string, and the atom 42 is always a number: Long before before the interpreting application gets to see them, their types have already been determined by the low-level reader. That's more than XML can do by itself.

So, without a strict definition of the language used to interpret it [you mean a strict definition of the language to be interpreted, right?], an s-expr has no semantics.

Of course not! Just as an MP3 player will be confused if given a Word document, Emacs will be confused if given a Scheme source file. What's your point? There's no such thing as a "generic S-expression," as far as I know; you have to pick one dialect. How about if I claim that Wiki-like markup is meaningless without a strict definition of the language? Does that convince you that all Wiki markup has no semantics?

So the arguments on this page reduce to "a data format has less semantics than a programming language". Well, duuuuh!

No, not a programming language. You know how those Lisp folks keep saying that DataAndCodeAreTheSameThing? Well, they are right, and that's why you're confused! See, just because the Scheme S-expression syntax is used by the Scheme programming language , that does not mean that the Scheme S-expression syntax is itself a programming language. Compare this to XML/XSLT. XML is a data format - no doubt about that. But XSLT is a programming language (more or less), and it uses the XML syntax. There's nothing strange about this. Programming languages have been using ASCII for both code and data for decades, yet I don't see you come yelling that ASCII is a programming language.

I'm not confused. An s-expr that is used as data HAS NO SEMANTICS BECAUSE IT IS ONLY DATA. Data is just data, until it is processed by something. That processing is what has semantics. The semantics of a programming language specifies how an interpreter of the language (or runtime support for a compiled language) should process the syntactic elements of the source text. Even s-expr's used as data in a LISP program are processed by the language at "read time": that's how certain atoms get created as integers, strings etc. But those typed atoms have no semantics separately from the language interpreter used to manipulate them.

Similarly, some s-expr forms (letrec, and define for example) are used to define circular data structures in terms of tree-structured s-exprs, but the construction of the circular data structure is defined by the semantics of the language, and implemented by the interpreter that executes those forms. So circular s-exprs defined using names introduced by letrec and define are not fundamentally different from XML attributes denoted as ID and IDREF in the DTD.


I don't want my data to become little applications with all the library and security problems.


Question: is (one 2 "three") considered to be a SEXP when it is sitting in an ASCII file called source.lisp? Or only after it is read in by a Lisp engine?

When it is sitting in the text file.

Actually, after it's read by a Lisp engine, it "transforms" into a list (represented with S-expr).


When you want to represent hexadecimal digits, would you represent it as (FF "Some hex")? Here what is FF? Is it a hex-decimal value, or is it semantic information?

I don't understand what you mean; could you clarify?

How can this be expressed as s-expressions?

 <foo>
<hex value="FF" />
<description value="Start address" />
 </foo>
Depending on what you're after, one of the following (or a combination) may do:
 (foo (hex :value "FF") (description :value "Start address"))
 (foo (hex "FF") (description "Start address"))
 ("FF" (description "Start address"))
 (#xFF "Start address")
As for your original question, i.e., what is FF?, well, "FF" is a string, and #xFF is an integer expressed in hexadecimal (LISP notation). I'm afraid I don't know what you mean by "semantic information".


This is madness. Talk about splitting hairs.

As a simple data format, XML and basic, atom/cons S-Exprs are pretty much identical. IMHO S-exprs in this form can be less wordy (simpletag data) vs <simpletag>data</simpletag>. Both forms have essentially no more semantics than the other in this basic form.

S-exprs can be more efficient in space for this simple reason, as well as when used to construct collections. The key discriminator between the two is that in an S-expr, whitespace is an element delimiter whereas in XML, it is not. The S-expr (list a b c) must be <list><item>a</item><item>b</item><item>c</item></list>. Also the S-expr has fewer sytax elements make horrors like CDATA less necessary.

However, if you add in interpreting the atom '3' as a number or text or whatever, that goes into the realm of the Lisp reader or Scheme parser, so you start getting an apple/orange juice cocktail.

Wrong. That fact that it goes in the reader is evidence of its being indeed syntax. There are no oranges here. The code for reading (foo) as a list obviously goes in the reader too. Are you suggesting that reading (foo) is "interpreting", while reading <foo>bar</foo> is something more basic?

While XML may well be the wheel reinvented, that argument must go back to the SGML folks (i.e. why didn't THEY use S-exprs), since that's the hammer from which XML spawned.

Today, I feel, if one wants to use Common Lisp/Scheme in defense of S-exprs, you should include the vast infrastructure built around XML in its camp for its proponents. DTDs and XSLT make XML useful, much like CL and Scheme make S-exprs useful.

And for the "plop S-exprs in place of XML" camp: Does anyone know where I can find a cheap, efficient, declarative DTD-like validator for S-exprs in Java? (GreenspunsTenthRuleOfProgramming, which, of course, is what this argument is really all about)

Right. The main problem is that XML + DTD's + XSLT etc. is pretty lousy when compared to, say, common lisp. Embedding an annoying syntax at the lowest level is just the beginning of the pain.


S-expressions are such a good data format, lispers want to write all their programs in it. Can that be said about the XML contingent?

Maybe that's the difference. XML users aren't xmlers. They are just people using a particular language who want to accomplish something.

Definitely. But you must understand the lisp mentality. They've had a real solution for decades. Lisp's inventor even proposed a business language based on it. And now, only now do people think, "Hmm, we need a good data representation for storing trees! EDI is so nasty." At the very least XML has to show some improvement over s-expressions.

And actually, XML proponents say that XML is the bees knees, capable of solving problems in a single bound. Hire me because I know XML! But if XML is such a good data format, why wouldn't it be used for something so important as storing code? People store code on harddrives, just like data. They use tools like grep on it, just like data. But unlike data, they'd never want to store it in XML. So we have to open our minds to the possibility that XML is just not that good compared to s-expressions for storing data.

The following is an automaton in Scheme, a lisp dialect. How would it appear in XML?

 (automaton init
(init : (c -> more))
(more : (a -> more)
(d -> more)
(r -> end))
(end  : (r -> end)))

XML is only a data format. Symbolic expressions is ascii art. XML will never be that readable. This example was from Shiram Krishnamurthi's talk at LL1:
 ftp://ftp.ddj.com/technetcast/mp3/tnc-0644-24.mp3 
 http://ll1.ai.mit.edu/shriram-talk.pdf


Here is one difference: when s-expressions had their day their fans were struggling to model human thought, now to be an xml expert one has only to know how to call the parser.

I think the problem that a lot of lispers have with XML is that from the outside it looks like the SGML'ers paid absolutely no attention to history when making XML. The result is a crufty syntax which doesn't gain you anything over s-expr (no, XML isn't more readable than s-expr in general, although it is for some (mostly simple) examples). Furthermore, XML + all the add-ons leaves you with a system that is more complicated, more error prone, and less powerful than a lisp. The entire problem domain that XML tries to address would have been better served by using s-expr and a simple, embeddable lisp (probably much smaller than common lisp) which you could provide for any language you wanted for much less effort than all of the XML/DTD/etc libraries that have now been written. Instead, vast amounts of effort have been, and are being, spent re-inventing wheels poorly.

Though not much of a lisper myself, I agree with the core of what the lisper above wrote. XML is another, arguably worse (although for some, arguably better) representation of trees of text. There is clearly more than one iso-morphism between XML and S-expressions. But the next notion, concluding "for much less effor than all the XML/DTD/etc libraries that have now been written" is flawed. The "lisp" part of XML - the parser and validation logic, isn't that hard and collectively we've wasted relatively little time on it. The time consuming parts of XML, like anything else, are implementing applications of it - XsltLanguage, XqueryLanguage, SoapProtocol, etc. The fact that XML uses angle brackets instead of parens, slightly different nesting rules, etc., doesn't make those applications substantially easier or harder to write than any isomorphic representation. Don't waste your time on this page, go out and write an XSLT for s-expressions instead.

I don't believe that XML and s-expr are isomorphic. They are similar, but the differences are important. I think I did not make my second point clearly. The point is that a lisp is a general purpose tool for solving a superset of the problems that people are trying to solve with things built on top of XML. This doesn't mean that if you fire up an existent lisp today, you will (necessarily) have a protocol for describing document structure, or style sheets, or whatever. What this means is that due to the strengths of lisp, it is a very good approach for *building* these sorts of abstractions. The whole point of this confused hierarchy built on top of XML is to provide abstractions, but mistakes have been made in it's construction that could have been avoided if more attention had been paid to lessons of the past. My contention is that there were (at least) two possible paths to a far superior system than what we have today. One would be to say "look, this whole approach has the flavour of some things lisp people have been thinking about for decades. Let's go see what they got right". This wouldn't (probably) have resulted in the use of a lisp per se, but the the whole hierarchy of tools would have been better than it is today. Another possibility would have been to say "Actually, what we want here is a lisp" then gone off and designed a dialect that was small, efficient, and could be embedded in any language or tool that needed it. Either of these two options would have been superior. Your example about XSLT is telling; code-walking has been an integral part of lisps for decades. There are sensible idioms and approaches for doing this that were well understood 10 years before XML was thought of. Please note: I am not an XML/XSL/XSLT/DTD/whatever expert (for that matter, not a lisp expert either), so I don't know enough of the history to know all considerations that went into each piece. Bits of this conglomeration can look awfully like mistakes that were made in the past.

I think XPath provides a very powerful, useful and readable way of not just "walking" but selecting trees and bits of trees (indeed, XPath is the answer to the discussion above about regular expressions and trees). There's a lot I don't know about lisp, but if "code walking" is the wheel we're afraid of re-inventing, I think it's been re-invented pretty well.

['code walking' *includes* the ability of selecting trees, bits of trees, (and for that matter performing transforms on them). I don't know enough about XPath to say whether or not they got this right, but even if so it is just one of a large number of wheels being re-invented here.]

<Again, to repeat the obvious. Some people seem to be confusing EssExpressions with LispLanguage. They are not the same!!! Lisp uses sexprs, to be sure; but sexprs can have many uses outside the domain of Lisp. Much of the examples of data-encoded-in-sexprs given above really are data-encoded-in-Lisp. Numerous Lisp extensions to the sexpr format are used - any token with special meaning other than ( and ) - as well as numerous lisp conventions (such as A-lists), yet the argument goes "look how powerful sexprs are!">

[why are you putting this here? nobody is making that mistake in this section... s-exprs are merely a superior building block (to XML) and lisp is being contrasted with the tools built on top of it.]

<Comparing sexprs with XML is an apples-to-apples comparison; comparing Lisp to XML is not. And there are many many many many many good reasons that one might not want to represent static data with a general-purpose, Turing-complete language; DTDs and schemas and the like were designed to not be TuringComplete. That way, they can be analyzed statically and shown to be harmless. See PrincipleOfLeastPower.>

[even if you don't want a complete lisp (which is one of the options discussed above, and no I don't believe it is a given that it is better to make it underpowered), you could do a lot better than DTD's and schemas etc. with a constrained lisp. Or at least by applying some of the ideas to a non-lisp.]

But comparing Lisp with XSLT or whatever is an apples to apples comparison. I don't know for what other people prefer, but some XSLT I've seen was pretty awful at first sight. I'd rather have looked at the equivalent Lisp code. Not to mention some code that I've seen that was trying to use XML from Java. Boy, is that kind of code verbose or what?

Should I also mention of metadata declarations in UDDI or XSD? Basically if you don't open in Visual Studio or some equivalent tool, to let you click your way through it, you are absolutely lost. I can't think of any uglier looking language design for expressing some simple data structure and simple function calls. Should we compare them to LISP?


One of the "bad smells" about XML is that a lot of tools that use XML as input sometimes require you to encode a list data structure as a string. Think for instance of the depedencies list in an Ant target, it is encoded as e.g. "init, build, test". Also consider the use of style="color: blue" in html/css: XML sometimes requires/forces/encourages you to embed another language inside XML, defeating the purpose of using single representation. Never, ever, ever have I seen, or has as I'm quite sure anyone else seen a list or tree data representation encoded inside a string in an S-Exp structure (such as a Scheme program). -- WouterLievens?

DevilsAdvocate: what about CommonLisp formatting strings?


Just to reiterate something the Lispers are saying:

Please stop apples-to-oranges arguments unless you can understand what the above points are saying.


How do S-expressions handle Unicode issues? What are the equivalents to the encoding directive, to xml:space and xml:lang, and to bidirectional text? is there support for something like namespaced element names? Then again, I've noticed this page hasn't updated in 2 years so this is probably a dead argument.

There's no one standard for S-expressions. One might just state that one's particular format is in UTF-8, period. Or one might choose a more binary format like Ron Rivest's canonical s-expressions (http://people.csail.mit.edu/rivest/Sexp.txt), in which case one could use display hints (perhaps) to indicate encoding. Namespaced element names are just longer element names...


Rivest's draft provides a list of points that allows direct comparison of S-expressions with XML.

Rivest (1997): Here are the design goals for S-expressions:

W3C (1996): The design goals for XML are:


See: UniversalStatement, BradyBunchGridDiscussion

Contrast: XmlIsaGoodCopyOfEssExpressions

CategoryXml, CategoryRant


EditText of this page (last edited May 1, 2013) or FindPage with title or text search