Why Xml Is Cool

What's cool about the ExtensibleMarkupLanguage is parsers. Having an XML parser at your disposal in your programming language makes it very easy for you to define your own XML flavor, or use any existing flavor, and write programs which can read it and work with it. You can also write programs that work with arbitrary XML flavors, though perhaps in a more limited fashion.

I don't know if XML is going to replace the StandardGeneralizedMarkupLanguage or the HyperTextMarkupLanguage or any of that. I don't think it will replace databases or CORBA or networking protocols. I think what it's going to replace is Lex and Yacc -- and many handwritten lexers and parsers, too. Lex and Yacc are more powerful than XML parsers because they can parse anything whatever -- but the documents that Lex and Yacc parse need not give any clue to their own structure. An XML document gives big hints about how to parse it. And XML is simple enough that an XML parser is not a great overhead to link into your program.

XML, therefore, is cool because it saves you the trouble of developing a proprietary data format and the parsers that go with it. Sometimes it also saves you the trouble of having to develop interactive editors for your data formats -- you can use your favorite text editor.

-- EdwardKiser


I disagree with part of the above. I think XML (in the form of the SimpleObjectAccessProtocol) will replace CORBA. Here's my thinking:

  1. CORBA and RemoteMethodInvocation can't succeed because they don't use the only protocol that the internet is built around -- the HyperTextTransferProtocol. (Perhaps I'm being too pedantic, but the Internet is built around IP, not HTTP. The WWW is built around HTTP.) Firewalls give CORBA grief (e.g. you can't use NAT-ing). Reverse proxies give CORBA grief. However, none of this applies to SOAP over HTTP. Network administrators like this much better -- you're not sending mysterious bits that could be executable code -- you're sending plain text. (Well, yes and no. You're sending mysterious bits wrapped in elements and sent over port 80. I'm not sure I'd be reassured by this if I were (still) a network admin. Is it a feature, or a bug?)

  2. XML gives you a "bootstrap" to writing a SOAP implementation on any language you care to. That's one of the reasons that SOAP is available in so many places now. Once you have a parser, you're half-way there.

  3. The same format that you use to send data over the wire can be used to store the data on disk. Any language with files can now have a serialization format. Try that with a CORBA object...

For more, see: XmlRpcVsCorba


A SmugLispWeenie might say that the reason WhyXmlIsCool is because it means the world at large has finally understood the power of s-expressions. Indeed, once they realize how important the automated parser is, they'll replace the insane <foo> syntax with (foo ...) and be then truly winning. HaHaOnlySerious.

However, the nearest analog to a DocumentTypeDefinition is a Lisp or Scheme program that validates the expression -- and EssExpressions are terrible for text, requiring as they do that it be placed inside quoted strings.

It cost me less than 50 minutes to whip up a validator for S-expressions that accepts a DTD for S-expressions. In Scheme, of course (MzScheme). Included in the distribution is a DTD for DTD's. You'll need to figure out the DTD syntax yourself, there's no documentation yet. (Sorry, had only 50 minutes.)

http://pcrm.win.tue.nl/~stephan/validate.tar.gz

-- StephanHouben

Some XML formats these days are actually programming languages, with forms like <if> etc. Following this direction seems likely to lead people to find and like Lisp. I wonder if people are already doing fun things like XMLTs for generating other XMLTs? i.e. compilers / macros.

It's also possible to use any programming language as a markup language; e.g., CeePlusPlusMarkupLanguage.

XML and SGML are more valuable for text, and S-expressions are probably better for data. I can write a book or an essay in XML with Emacs but I wouldn't want to write one as a giant S-expression.


XML is cool because it takes power from the hands of proprietary file format makers and puts it back in the hands of the people who created the data in the first place.

I'm not so sure. The XML file formats I've seen are far from easy to interpret, unless you have very good knowledge about the application that created them (like access to the source code). It takes a really good design of the output format for it to be understood by anything but the original application.

That was the original intent of XML. The geeks saw the threat for what it was and quickly moved to re-establish dominance. The gesture is still cool.

Someone else said: Exactly, it is just an ASCII format with vague hints at what things mean. XmlSucks IMHO.

The comparison to ASCII is apt. Imagine if text encoding were no longer standardized: Say you're writing a web form and you have to figure out if the text was entered using ASCII or EBDCIC. Major hassle. XML works like this, standardizing a syntax for structured content. It's not a big deal, and someday when the hype has faded not that many people will put it on their resume. (Have you ever seen ASCII skills on a resume?) But it's nice to have the standard to build other stuff on top of.


XML is coolest because, if used widely enough, can lead to permanence of information. It is readable by humans on its most basic level just by looking at it in a text editor. Today, we're losing far too much information into the black hole of proprietary formats. This leads into its second great benefit -- high-level rules (a la XSLT) can be written to translate it regardless of the program that originated it.

If your document archiving system is based on some proprietary software, you can count on either losing some information or losing your documents altogether because your software is no longer available.

That's the theory. In practice... at present, most XML data repositories are too young for this to have been effectively demonstrated, while some very old data in proprietary formats is still usable today. I'd also keep in mind that XML is a standardized syntax, which can still be used to encode proprietary semantics.

Of course, but it's already easier to reverse engineer; plus you don't have to write translation code so much as specify high-level rules to get it out of said proprietary format.

The same was true of java bytecodes, preserving identifier names and so on. But then people wanted to use the same infrastructure without being open to reverse-engineering, and started using obfuscators. How long until XML obfuscators become popular? :-)


See also: PowerOfPlainText, AngleBracketedUnicodeText


EditText of this page (last edited November 21, 2005) or FindPage with title or text search