Xml Abuse

There are countless examples of XML abuse out there. Many of these examples seem to motivate people to come here and argue that XmlSucks, when what they are really arguing is that that particular use of XML sucks. Clearly, those examples do not belong in a discussion on whether or not XML sucks, but they are interesting nonetheless, and this page is a good place to discuss them.

Common abuses:


This page ended up more or less empty after refactoring the problems into patterns; should we keep it nonetheless?

Yes, please. We are still seeing plenty of folk (primarily Lispers, but there are others) who are arguing that XML is a bad technology because it doesn't do this or it doesn't do that or because the hype is harmful or because it is difficult to parse or because of some other characteristic. In point of fact, almost all of the objections to XML - as well as most technologies that draw fire on this Wiki - are based either on a lack of understanding of the proper use of XML or because some Wikizen has had a bad experience in the application of the technology. It's good to have a central clearinghouse to debunk some of this noise.


Using XML where a database would be more suited

One hint that you may be better off with a relational database is that there is no obvious hierarchy to your data. In particular, XML does not lend itself very well to represent many-to-many relationships. It can be done, of course, but prepare to


Using XML as a procedural language, the way Ant does, strikes me as XmlAbuse.

I disagree, first of all because Ant is ideally meant to be declarative, not procedural. A build file is a collection of named actions with parameters and possible sub-actions and/or sub-parameters. Coming up with a new syntax (like, one that requires a tab as the first character, for example) for this when XML is perfectly applicable would be silly.

The problem is that ant is not yet smart enough to figure out what you want it to do. Also, often you just want to do something, like copying files. And a declarative approach is, at best, unintuitive here. The XML gets shoehorned into some poor make imitation. I think this example is abuse of XML. ~~ Jan Seeger


(moved from a poorly named page)

Rendering datafiles human-readable helps no human read them because for even trivial datasets no human can keep track of all the complexity in the file.

So ExtensibleMarkupLanguage is really nothing but another IllusionOfControl. It's not only not a silver bullet, but for most practical purposes it increases system fragility and decreases system performance.

Anyone telling you otherwise is selling something.

Please offer examples or links to research backing up this claim. Otherwise, you are just HandWaving with no proof.


XML is certainly more readable to humans, and displayable by more tools than windows registry binary format. Nor is any special-purpose tool necessary, see PowerOfPlainText.

That simply depends on the tool you use to read it.

Doesn't everything? You have to use a tool to do anything with a computer; why is that a problem?

And with XML (or plain text) I don't need a different tool per format


I'm afraid this is missing the point. Yes, of course, you can read text with any tool. The trouble is the text doesn't mean any more to you than the original binary did. No one ever pages through an XML text file trying to understand what's in it. You don't, I don't, your customers don't - no one does.

So the data may as well be a binary standard. XML is hyped as some kind of universal data language, but it doesn't do anything for the data but formalize its' alphabet a little. The ontologies represented by the data are still hidden, implicit in the combinatorial relationships between each element.

And for this imaginary benefit of "readable data" we're going to retool everything yet another time? What's the bloody point?


I think that's an underestimation of the final users.

Anyways you can't expect XML (or any other input method) to do all the work itself, if you want users to be able to use it just describe them how your file's elements and attributes work, just a reference specification file would help.


There are 6 dubious BenefitsOfXml.


text doesn't mean any more to you than the original binary did

Disagreed. Which is more meaningful to the developer who has to update this file?

 <book>
  <ID>3049</ID>
  <year>1992</year>
  <ISBN>83948301982</ISBN>
  <title>Nanotechnology</title>
  <author>B. D. Thispin</author>
  <edition>1</edition>
  <pages>266</pages>
 </book>
...or...

 BE9199283948301982NanotechnologyB. D. Thispin1266
Depends on what tool the developer is using. But see the real BenefitsOfXml to understand why your format is preferable.

Also, XML has parsers that actually work, unlike everyone's almost but not quite correct CommaSeparatedValues parsers, or the endless binary formats that get messed up when moving from system to system.

You know, there is a third option ...
  :  book{
  :ID{3049}
  :year{1992}
  :ISBN{83948301982}
  :title{Nanotechnology}
  :author{B. D. Thispin}
  :edition{1}
  :pages{266}
  :  }
This is readable, parseable, and weighs substantially less than XML.

We currently face a bandwidth crisis in an embedded context, and "the word" from the top is that XML shall be used. The ConventionalWisdom (current ManagementFad?) is that XML is a "standard" and that therefore there are plentiful tools and lots of industry support, and somehow the fact that this will more than triple the load on the wire as well as stress the embedded resources is lost in the HypeStorm.

It gets better: it's not one embedded device, it's a group of related devices that have to talk to one another as well as upstream to a larger system that houses the bulk of the business processes and data. The big iron has no problem with the extra load - hey, it's just more memory and disk - and the CPU loading is incremental.

We currently use a protocol with carefully optimized binary structures, whose processing times are well defined, and the system is acceptably deterministic. Converting our protocols to an XML transport will pretty much - by definition, actually - destroy determinism.

It occurs to me that I'm ranting, so I'll let it be for now. There's probably a better page for this. XmlSucks, perhaps?


There isn't any reason to switch to XML if you have a protocol that works much better for your purposes, granted besides of the problem of XML being a language that takes much more space than it should is only the half of the issue.

The other half is the lack of initiatives to actually beat it. And that it is seriously becoming a standard.

About that alternative: it seems awful, mostly because of those : It also seems that there is not any noticeable difference between an element or attribute.

while we think about alternatives:

<alternative>

this is a comment, not content
<elementWithChildElements>
<child color="red"/>
<child color="green"/>
</>
<elementWithoutChilds/>
<elementWithoutChilds numberattribute=3333 literalattribute="K"/>
<explanation text=

"Elements can't have contents, just attributes and child elements (unless they are elements whose tags end with />) you can use any character between \" \" of attributes besides \" , \" , and \\, no other quote kind is allowed for attributes (for example '). Every character inside \" \" counts and is considered when parsing

Info can only be considered if it is inside < and > (you can freely use them for attribute text though and they are ignored because they are between \" \" ) Characters that are not inside < and > (for example indent spaces and tabs) are considered comments . No other way to consider comments exits.

Inside element openning tags white space is ignore unless it is inside \" \" (things inside \" and \" are the attributes' values and are handled in different ways

No, no ending tag is specific for the element, it is a non sense to have those in XML anyways ( <s><a></s></a> is not allowed, there is no reason to specify which tag you are closing because it is always the more deeply nested one)

And let's specify a difference between elements and attributes: elements may have attributes and child elements, attributes are value containers.

It is just a dream although I would like to make a parser for this, XML is the standard and there is little the \"little ones\" can do to stop it

"/> </>


Open the simplest Excel spreadsheet you have. Now do Save As type Xml spreadsheet. Then Save As type Csv (Comma Separated values). Now compare the output and decide whether you want to spend 3 weeks using Xml in your project versus 1 hour using Csv.

Okay, now take that XML file and rearrange the child elements of any element. Save it. Take that CSV and rearrange any of the comma separated crap. Save it. Parse both. See what happens.

If you are going to use silly examples then I can, too.


CategoryXml, CategoryRant


EditText of this page (last edited May 27, 2011) or FindPage with title or text search