Xml Permutation Dependence Sucks

XML is the most stupid data model in this regard (worse than the old IMS), because it makes the physical order of unrelated data crucially important. Try to change the order of unrelated elements for example in web.xml in a web application, and your Tomcat/Websphere/Weblogic whatever will refuse to start, even if it has all the needed and correct information in there. It's just not in the physical order it expects. It really sucks.

Thank you! But isn't this really a problem with Tomcat/Websphere/Weblogic rather than a problem with XML?

No, it is a problem with XML defined semantics. By default physical data ordering is important:

   <person>
    <first_name>F</firstname>
    <last_name>L</lastname>
   </person>
is semantically different than
   <person>
    <last_name>L</lastname>
    <first_name>F</firstname>
   </person>
to make matters only worse, it is still different than
   <person first_name="F" last_name="L" />
Data integration solutions, you think? Be ready to write lots and lots of XSL. Be ready to buy the latest and greatest hardware to swallow all that unnecessary extra tag garbage.

We don't use no DTD tables here. Might those be the causes of dependencies upon specific permutations?

Then maybe you can convince the committee to drop them altogether, and schema also. When you deliver some XML to a third party, make sure to remind them to turn off validation. If you pay a visit to said committee, maybe you can mention that end tags are really redundant and stupidly annoying.


One party says XML is PermutationDependent, another party says it is NonPermutationDependent, and a third party (or maybe the first party again) says it somehow depends on whether you are using a DTD. Could some knowledgeable person please give a clear description of this issue?


By default with no DTD XML is NonPermutationDependent - the lastName and firstName tags can appear in any order - XML does not imbue any semantics into tag ordering. The attribute version <person firstName="F" lastName="L" /> is different to the tag version however - in that attributes are handled differently to tags in the two main parsing APIs.

If you choose to create a DTD (or a RelaxNg schema - and I presume the W3C schema) you can choose to make *your* language either PermutationDependent or NonPermutationDependent.

First of all, DTDs and schemas are an essential ingredient of XML. You can do without DTDs, but then, you sacrifice a serious part of XML (and associated software stack) functionality. It is like having a database with no schema: no validation, nothing.

Second, it is rather non-trivial and whole lotta work to create a DTD that will validate independent of the physical order of tags. As evidence to this fact, you already have a lot of DTDs out there that are order dependent. And if you think otherwise please sketch an algorithm that will take a DTD as input and create the order independent DTD.

Third, even if you complete the second step, that only means that documents will "validate" independently of the order. However physical order of tags will still be pervasive. It's enough to read the DOM API to realize that. The only thing you can do is work around the order dependency of XML. That's a major flaw in XML design.


The reason for all of this is that XML wasn't invented to serialize general data structures. It was invented to add computer readable tags to natural language documents. The idea was that you could mark certain words and phrases in a document and a computer would know something special about them. A new story, for instance, could contain markup that let a data retrieval system know that <personName>Bill Clinton</personName> was a person's name. From that perspective it's obvious that order matters.

It's not obvious to me. What does data retrieval have to with order?

Data retrieval has nothing to do with order. XML was created as a way to mark up natural language text. Order matters in natural language text, so XML has to preserve it.

So you are talking about a particular application of XML in which order is required. XML doesn't have to preserver order. An app needs to read XML and do the right thing. first name/last name can be in any order and then correctly ordered when applied.

Yes, I'm talking about a particular application of XML. It's the particular application for which XML was created. XML was created to add schema tags to natural language text. Natural language text is ordered, therefore XML documents are ordered. Using XML to represent serialized data structures is a hack, so the fact that XML orders unordered data structures is entirely understandable.


 <sentence>
  <word>Permutation</word>
  <word>Xml</word>
  <word>Dependence</word>
  <word>Sucks</word>
 <sentence>

:-)

Might that be taken to imply that XML has poor support for turning permutation dependence on and off at need?

How about this?

 <sentence>
  <word position=2 value="Permutation"/>
  <word position=1 value="Xml"/>
  <word position=3 value="Dependence"/>
  <word position=4 value="Sucks"/>
 <sentence>

This is awful, how can an application reading this handle duplicate positions? And you forgot double quotes around the positions.

[Also, how can a schema specify that a valid document has no duplicate positions within any given sentence?]


If you use DOM based access then order won't matter. The whole file is sucked up and put in data structures. If you use an event based parser then order can easily matter because you are depending on something to setup the context for something later. It certainly is nothing XML specific. As the earlier parsers were event based you might find order mattering a lot more than it should. Where order matters i use index attributes rather than depend on file layout.

It always matters if the parser validates. That's why tomcat won't start or an web app won;t deploy, if you put one unrelated option in front of another unrelated option. It is very XML specific, because XML established from the get go that order matters.

You don't need to write a DTD so that order matters.

The DTD is already written (for example: JakartaTomcat? configuration file, web-app.dtd, etc, etc). And if DTD or Schema are not written that means that expensive checks are implemented by hand at the application level. You loose. So XML is quite inconvenient because of this. It is much easier to write a configuration file for Apache, for example.

You loose when ever anyone does something stoopid, XML or whatever. Having edited many many apache configuration files, i don't think it is any easier. And XML isn't about making it easier. It's about making it standard.

Would it be too much to ask to have things be easy, as standard? (Er, what am I saying? Examine almost any standard: apparently it is too much)

I think learning to live in an imperfect world is a hard won skill :-)


EditText of this page (last edited November 17, 2003) or FindPage with title or text search