The Real Strength Of Xml

Everyone is jumping on the bandwagon of the XmlProtocol these days. But why? As PowerOfPlainText points out, just plain old text files are really easy to generate and use. As some of the arguments in MythOfMetadata point out, this "self describing data" claim is mostly a crock -- it doesn't magically allow any program to talk to any other. WhyXmlIsCool claims that the real secret is just that you don't have to write your own parser.

But you do have to write your own parser at some level where ExtensibleMarkupLanguage bottoms out and leaves you with a stupid piece of text, a character string. Look up the Alan Perlis epigram about using strings as data structures.

[Yeah, but you have to do that anyway. Remember that XmlProtocol is a data transport, which only means that the data gets from one place to another without getting banged up too badly. It's still up to you to use the data wisely.]

I claim that the real strength of ExtensibleMarkupLanguage is this: It is a standard format that ALL (practically all) structured files will (eventually) use.

Before AsciiCode became universal, people had to spend lots of time worrying about the encoding of files. ASCII wasn't any better than EbcdicEncoding? or any of the other alternatives, it's just that having a standard benefited everyone. Today (except for the recent emergence of UniCode) a "text file" is just a "text file".

ExtensibleMarkupLanguage is like that. Ascii made all text files "just text files" and that was a good thing. XML will do the same for all STRUCTURED files (and throw in unicode for free too!).

("Unicode for free"?!? or ForFree? My experience with Unicode has been an unmitigated disaster. It's hellishly complicated and prone to subtle bugs. The CodeRed worm that infected Microsoft web servers in 2001 was made possible by an extremely subtle bug in UnicodeNormalization?. -- MarkPilgrim)

No it won't; it leaves lots of lexical analysis unfinished. What if my ExtensibleMarkupLanguage file contains <integer>0x1234</integer>. Will your XML parser recognize the 0x prefix? Or will you have to take the XmlProtocol node, extract the text, and hack it through a text-to-integer conversion routine?

(Random interjection: If you have <integer>0x1234</integer> you're not really doing it right, just like the XmlSucks examples of <window-position>10,23,50,70</window-position> or whatever. Why use two parsers when <integer radix="16">1234</integer> only needs one?)

You have to do that anyway. The parser simply produces a text string as the result; this string has to be converted to a native value to be used. It won't really matter how you specify the value - 0xFF is the same as 255 as long as your DTD/schema allows you to use one or both of these notations. If you try to use a notation that is not specifically allowed, the validator kicks the XmlProtocol file as bogus.

What IS the standard ExtensibleMarkupLanguage lexical syntax for writing an integer? A structured data notation should be capable of representing basic types like integers and strings.

Again, it isn't important. XML is simply text. You decide what notation to use when you create the DTD for the field. If you use the valid format then the validator will like your smell. Otherwise it gets kicked before anything even tries to convert from text to native machine representation.

The standard XML way of doing an integer isn't XML; it's XSD.

<mytag xsi:type="xs:int">4660</mytag>

As usual, it's verbose but well-specified; "int" is the 32-bit signed version.


The fun part of ExtensibleMarkupLanguage is, IMHO, the creation of a generic format and tools. For example, at work, we have a nutty text file format with a bunch of numbers in a format that made sense to one person. I tried to hack with it and gave myself a headache. So the person to whom the format made sense created an ExtensibleMarkupLanguage representation for the file and an XSLT style sheet for the file that outputs the old version. If you have two different XML formats, you can probably create at least a decent translation between them with XSLT tools. This means you get a translation system instead of a short script written in somebody's favorite scripting language. The use of ExtensibleMarkupLanguage also makes it easier to write a parser for a foreign format. You know how fields are delimited. The only problem is that binary data is a pain in the rear to deal with.


I don't know how useful this discussion will be, but I'll throw my hat in. If you're not convinced that XmlProtocol is the way to go, you probably don't develop for the Web enough, which is a fair position to be in. I'm a big XmlProtocol booster because it makes life so much easier. It's not very often you can stuff multiple standard syntaxes into a few kilobytes each that represent rich data models and dynamics. If you spend a lot of time on http://w3.org, you start to understand there's an XML standard in the works for anything you want, and while not perfect always, usually they aren't completely insane. Also, if you don't like it, join the working group. Essentially, if you want a rich data model, the various Internet standards centering around the W3C are the things to read. Like I've said many times, XML as a transport layer makes your data backwards, forwards, and laterally compatible. Show me another transport layer that can do the same as well as XML. -- SunirShah


I agree with the premise of this page; XML stealthily promotes standards -- something which otherwise is sorely lacking in the business of data file formats. However nothing guarantees that XML will buy you the interoperability that the hype promises. If your XML format is Evil, it really doesn't matter that it's XML. It doesn't matter that it's XML if no one else can deal with it anyway.

A couple of cases in point:

I did my web page in XML a while back while playing with MozillaBrowser. I set up a spiffy CSS and it rendered great in my build of Mozilla. Didn't render anywhere else, but it rendered great on my system. Pretty useless. Translating server-side XML to HTML for various browsers has much more promise.

My current department gets data files from a variety of sources. Some are ASCII. Some are EBCDIC. Some are mixed along with some binary data in there. All in all we have about 8 different formats we deal with, along with 8 different parsers (All in one monolithic application. Ick.) I've been pushing the upstream people to start specifying XML in their designs. Thus far they seem to think I'm joking, despite the fact that it would force all the upstream people to standardize, which would in turn reduce the complexity of numerous apps, not just in our department. These people may be the only people on the planet who haven't bought into the XML hype.

Oh, yes, and if all the various employment agencies and companies would accept an XML resume corresponding to the Resume DTD on Sourceforge, I wouldn't have to build my resume on the web pages of every company I want to apply at. One of these days once I solve my JDBC woes (I'm not trying too hard right now) I'm going to have to hack up some JSPs that build that XML resume and let you download it. I'd give it away for free in hopes that all these companies would adopt it and I'd never again have to build my resume on another site...

-- BruceIde


A big drawing point to XML for me is the ability that creating this generic meta-data oriented language has had on creating XmlStandards?. While us programmer types can argue the fact that as far as XML is concerned, there is NothingNewUnderTheSun, the tools that a meta-data based language has created (like XmlAuthority?, XmlSpy?) seems to open the process of creating document formats to the real DomainExperts, rather than me... -- ChadThompson

Chad's right, all of the above are well and good, but focus on the benefits to us technologists. Someone once told me (in a pub, so this might not be quite right) that these advantages for the implementor are pleasing side-effects of the original intent of ExtensibleMarkupLanguage: that it gives back to the user ownership of their data (PowerOfPlainText). That'll turn out to be especially valuable if the mooted world of MeteredApplications? comes to pass. -- KeithBraithwaite


TheRealStrengthOfXml only becomes visible when you have used it over time. I can't give you that experience but I can share a story. Imagine systems communicating over http, the interface parameters passed as XML. One system can add additional parameters without breaking the existing interfaces. One system can accept additional parameters whilst still processing earlier versions of the data structure. This is a simple observation but leads to a conclusion that is quite organic:

XML enables independently evolvable systems whilst maintaining interoperability. -- PaulCaswell


XML has real strength to me because it means I can describe all manner of data that used to be kept in binary formats. Imagine a black box that uses a GUI front panel for control. The user(s) can set up colors, place buttons, use sounds for event notification, etc. Using ExtensibleMarkupLanguage I can describe all of the configurable aspects of the GUI and store this description in a file. I can have as many of these config files as there are users who want to create them.

But this isn't the end. The black box also has all kinds of programmable facilities that I can change by using IEEE commands (through a GPIB interface or a serial port) or by pushing buttons on the front panel. Using an XML description of the features I want to control I can set up operational configuration files that will replace two minutes' worth of GUI button presses and menu searching with a single act. Additionally, the config file will be legible. Instead of having an IEEE script with something like

 NMSQ 12 14
 BZRT 3 F2
 ...
I can make an XML file look like
 <NormalSizeQuotient Elipsis="12">
 </NormalSizeQuotient>
 <BaseZeroReactionTolerance Variation="3">
F2
 </BaseZeroReactionTolerance>
 ...
which is a little easier to read. No BinaryData? stuck in a file. Yowsa. And by the way -- the XML config file can contain all or any one part of the parameters I am trying to set up. Thusly, Joe can invoke Sam's config file and then invoke his own config which only changes one button color. This stuff nests really well.


Am I the only one who thinks XmlSchema is the coolest thing? I can write an XML document, have my code call the Xerces parser to read that file and slurp up my schema in the process. Then, my document will only succeed in loading if the file is valid according to the schema! This is the real power -- I don't have to worry if my data is correct and I can assume EVERYTHING is right before I start walking the tree and executing code.

I like the XmlSchema language, it makes sense to me. I can accurately describe most of the constraints that my XML needs to abide by. And now, in the Java world (OoAndXml), you have JAXB, the JavaArchitectureForXmlBinding which is the coolest: it is a compiler that converts a (constrained) XmlSchema to JavaBeans classes that can serialize documents to and from XML in that Schema's dialect! Oh man, what a productivity boost! In my opinion things like JAXB are what make ExtensibleMarkupLanguage great. I wonder if the XmlSucks guys have all the same neat tools in Lisp. (I honestly don't know, which is why I ask.) -- DaveWoldrich?

Yep. In the LispLanguage world, all those neat tools are called Lisp. (LispVsXml)

Yes, but for some reason, LispLanguage scares the bujezus out of people.

It does?!? It simply annoys me (SmugLispWeenies). Fortunately, there is real strength in XmlProtocol, so I simply ignore Lithp.


I have to agree that one of the good things about XML is that it encourages declarative thinking (DeclarativeProgramming). Declarative approaches tend to be language-neutral. For example, XML-based GUI's could potentially be used by multiple languages. Behavior-centric GUI protocols tend to be locked to a particular language. However, XML tends to ignore relational's lessons. So thumbs up on the declarative side, but thumbs-down on the underlying organization. EssExpressions also flunk relational, I would note. -- TopMind

XML is yet another language, and any particular XML schema may require a great deal of processing and evaluation to truly utilize. I've never really understood this 'language-neutral' concept you keep describing. As far as 'relational' goes, I agree with you that the ExtensibleMarkupLanguage wasn't well designed for the transport of multiple data items; it is clearly designed more for a single hierarchical structure. Something like a TableOrientedXML - a transport designed explicitly for relational data and meta-data - would be a worthy investment of time.

FWIW, YamlAintMarkupLanguage allows multiple, independent data items not in a hierarchical structure. -- not TopMind

XML is not necessarily hierarchical since one can make references. It is "navigational". And it can represent relational structures, it just doesn't encourage it (and is somewhat verbose at it). --top

Unless those are both standards of XML, rather than features implemented atop XML in yet another XML-language, you're not really talking about XML. E.g. one can make an XML schema to represent any particular Relational schema, but that doesn't mean that all the XML tools out there will or ought to be able to work with this 'relational data' as though it were meaningfully 'relational'.


See BenefitsOfXml


CategoryXml


EditText of this page (last edited September 24, 2007) or FindPage with title or text search