Xml And Soap Are Good For What

Sometimes we lose sight of what kinds of problems our solutions are searching for. Question here is what can you do with XML and SOAP that you couldn't do just fine with binary data and function calls? What are the patterns of application and use they enable that were disabled without 'em?

Now don't go blathering about serialization, caching, and tunneling. That's more implementation bumf. We serialized and cached and tunneled when XML and SOAP were nothing but twinkles in eyes. And don't go telling us about which corporate plumbers are playing which buzzwords in which marketing campaigns. It simply doesn't matter.

We want to operate on the basis that there is actually some good in these endless Babel syntaxes -- so we want to know here what that good is.

Or are XML and SOAP just more artifacts of academic and corporate pud-pulling -- just this year's fad? Should we just stick to RPCs and let this storm blow over?


A quick summary, XML is good for

See also TheRealStrengthOfXml


XML is a mechanism for describing data in an environment where a human and an application both can determine what the data is and what it represents. An XML data file can have additional junk put into it or massive deletions hacked out of it and still maintain some data integrity. Any data that is stored in a fixed binary format must have everything the application needs to read the data. With XML even a dumbass like myself can look at

  <Bloofta>
        Goobers!
  </Bloofta>
and realize that the Bloofta element of this data collection has a value of Goobers! Does that have any value?

For further discussion I direct your attention to the excellent book, "XML Bible, Gold Edition," by Elliot Rusty Harold, published by Hungry Minds (now Wiley).

-- MartySchrader

Another good book worthy of purchase is "Java Web Services Architecture" by James McGovern, published by Morgan Kaufmann -- Steve Graham


XML is fine for transferring information between diverse systems, but not as a persistence mechanism or a database, at least beyond the trivial. It might complement or supplant things like EDI and CommaSeparatedValues, but the idea of an XmlDatabase is utterly ridiculous. A true sign of something being overhyped is when somebody tries to turn the concept into a new database paradigm or product. Thus, if you keep hearing about Glop, and then you see that a vendor is announcing a Glop Database Management System, time to close the curtains.


To expand on Marty's point a bit: XML is "better" than binary formats for the same reasons that high-level languages are "better" than assembly language. It provides file and message formats that are more easily readable by humans, because it uses symbols that more closely match the problem domain rather than the implementation domain.

XML and SOAP don't provide any capabilities that RPC, CORBA, DCOM, RMI, etc. don't already have. But the fact that XML/SOAP are text-based and human-readable means that it is generally easier to implement programs in a variety of programming languages and for different operating systems, because it is generally easier to implement text generation and text parsing than it is to do with binary formats.

You can argue that binary formats are just as easy to use, but with XML you don't have to worry about issues like byte ordering, structure packing, word size, etc. Binary formats tend to be tightly coupled with specific heavy implementations (for example, very few people implement just IIOP -- they implement a whole CORBA ORB system), whereas XML is just a file/message format that can be parsed by any XML parser and can be trivially generated in any programming language that supports text strings.

I think XML is well beyond the "this year's fad" stage. Like it or not, it is being supported by major industry forces and is going to be around for a while (of course "a while" in computer industry time may be only a few years).

There are plenty of reasons not to use XML. If you are happy with your RPCs and binary files, there is no compelling reason to throw them away and re-implement everything with XML.

-- KrisJohnson

Thank you, Kris. To expand a little bit, let me just say that XML provides an industrial-strength mechanism for describing aspects of a system that previously had to be described using specialized mechanisms. One project I'm diddling with right now uses a small GUI front panel with a handful of fixed-position buttons along the sides like an ATM display. The setup for the entire GUI can be stored as an XML file. Colors, button assignments, sounds, actions, everything. Without XML we would still need some way of describing this good stuff; maybe a Windows type .INI file? Whatever. XML provides a means for describing everything I need to set up.

And it's flexible, too. My GUI setup file may contain enough information (elements and attributes) to set up everything the instrument shows the user. Or, it may only change the functionality of one button. It may contain stuff that the GUI doesn't even understand, but the parser/interpreter will just ashcan that noise. Setup files can be forward and backward compatible between versions and even between completely different instruments. (All instruments have a background GUI color, a main button color, etc.) All of this good stuff makes XML a highly valuable tool in the embedded systems designer's arsenal, believe me.

-- MartySchrader

What I think is, most programmers must consider parser writing difficult indeed nowadays, seeing that XML-based formats are deemed less of a burden to handle than everybody's own cleartext formats (like SMTP, HTTP, and friends). Maybe it's just me but I think that XML-based formats are very messy compared to handcrafted cleartext formats specifically designed for the purpose at hand.

I wonder what this "major industry support" is when all XML gives you is one specific representation of trees...

An excellent example of why having a standard way to express tree structures is not much help are the configuration files of ApacheHttpd and ApacheTomcat. ApacheTomcat has much more structured configuration files but the structure is twisted and does not have much to do with the actual requirements of the user's (i.e. administrator) domain. ApacheHttpd's configuration files mostly consist of a series of declarations and are easy to read, write, and group in the way the administrator sees fit.

-- PanuKalliokoski

If you want to represent lists and nested data then apache's format if far from ideal. With XML you get universal format with interpreters available anywhere that can be used immediately. That's nice. You don't need to reinvent the wheel for something that really should be a minor issue. XML isn't perfect but it is GoodEnough and WhoReallyCares?. Better than trying to understand yet another version of someone's ideal configuration file format.

--AnonymousDonor

For most things, you don't need to represent lists and nested data structures. On the other hand, for e.g. acyclic directed graphs, XML structure is not even sufficient per se.

And lastly, trying to understand yet another XML file is in no way easier than trying to understand yet another custom-language configuration file.

My point is just that XML doesn't really add value to a plain text file, and that actual applications of XML seem to get messy and difficult, so it's plain text that is IMO GoodEnough and who really cares (about XML).

-- PanuKalliokoski

"For most things, you don't need to represent lists and nested data structures." You can use RDF for that.

I often need lists and nested data stuctures. I suppose it depends how complex your data are. Apache may not have have a more complex structure simply because it's not easily expressible. Having writtien many LittleConfigurationFileLanguages? I find XML a big win simply because it is common and there are lots and lots of tools for it and it's easy to write in any text editor. XML is not God's gift to anything, it is just useful. The fact that you don't find it useful doesn't mean it's horrible. It means you have a different definition of value than those of us who find it valuable (not perfect/ideal). -- AnonymousDonor

Writing a parser isn't really hard, but it's nice to have it taken care of. It's especially nice if you're working in insanely heterogeneous environments where your data might be parsed by code written in 20 different languages. If you were only writing your text data for internal use in one language, the gains from XML might not be so great. But if, say, you were writing your RichSiteSummary file using a CMS written in RubyLanguage, to be slurped by 20 different aggregators using Python, Java, VisualBasic, ObjectiveCee and everything else you can think of, the net savings to all developers is a little more significant. -- francis

Now if only XML would give you a parser for free... but it doesn't. SAX is just a lexer, you still write the recursive descent parser yourself. DOM is a parser, but a primitive one, you still have to transform the resulting tree into something sensible, and it also effectively prevents you from working with large inputs.

Xml is a quick answer for a number of design questions. It is a handy way to pass around hierarchical data between HetrogenousSystem?s if you are not worried about bandwdth. Many platforms (eg, JavaLanguage) have libraries to serialise and deserialise ValueObjects? into an XML representation. Combined with XmlSchemas and automatic validation, many otherwise thorny issues become a breeze to deal with. -- PaulMurray


SOAP is RPC on Twinkies. It doesn't do anything better than RPC. The only advantage I can see is that SOAP messages are easier to identify if they somehow get misdirected to a printer.

XML is a mechanism for describing data in an environment where a human and an application both can determine what the data is and what it represents.

That's a common misconception. XML mixes schema with data, but it doesn't help humans or computers understand what the data means. All it can do is trick humans into thinking they understand the meaning. The principle benefit of XML is that it decouples the sender and receiver from the schema, letting the sender add data that the receiver doesn't expect.

XML is not supposed to let humans or computers know what the data is. XML is supposed to get the data from one place to another without garbling it up too badly. It is up to the receiver and sender to agree on what is going to be sent and what meaning that data has. XML just guarantees that the data gets there.


Pro SoapProtocol views

When a vendor pushes the use of SoapProtocol to do things you need and no alternatives exist, then it starts to look real good.

A usenet link to using SoapToolkit to integrate heterogeneous environments

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=eTnX%23X%249DHA.3616%40TK2MSFTNGP10.phx.gbl&rnum=9


A Thought

What XML gives you is a decent default appearence. Look at a a file in binary format that you're not familiar with. Look at it in a hex editor. Reeeeeally look at it. Figure it out (I'd suggest a reasonably simple format such as xBase DBF myself), so you could do some basic parsing of it. Now, do the same exercise with an XML file.

Don't get me wrong... I'm not a huge fan of XML. But it's a standard. And it's a standard which is easier to read than raw binary. I truly hope this isn't the best ITdomkind can do, but it's a start.

--WilliamUnderwood


Just having a standard format that includes encoding information is a godsend for multi-lingual environment. You don't have to worry what what encoding (ASCII, GB, etc) the text will use, just use a standard parser and get the string you need.

--OliverChung


KrisJohnson: "To expand on Marty's point a bit: XML is "better" than binary formats for the same reasons that high-level languages are 'better' than assembly language. It provides file and message formats that are more easily readable by humans..."

Having to debug your application by decoding the on-the-wire format of your messages is a CodeSmell. It indicates that you have not properly tested your PresentationLayer. Indeed, why are you even writing your own PresentationLayer when many already exist, some of which are international or industry standards? Why not just reuse an existing technology that has been around for a very long time? Ask yourself when was the last time you examined the wire format of calls when using RPC servers, out-of-process COM components, CORBA objects or remote Java objects by RMI (to name just a few examples)? If the answer was "never", and it was for me, then why should the same not be the case for web services?

Spoken like a true plug-and-play kinda guy. When you are dealing with non-standard formats in non-standard protocols because That's What The Customer Asked For then you'll understand. And remember -- it's okay; we bill by the hour.


That XML is readable by humans is a myth that long needs to be debunked here on Wiki. For anything other than trivialities, XML is not conveniently readable by humans. You have to be pretty despaired to read XML, like debugging some over the wire protocol. Has anyone tried to read data definition in WSDL or XSDL ? How does it compare with data definitons in Pascal, SQL, Corba IDL, Microsoft IDL, ASN.1, RELAX NG (alternative for defining XML schmeas)or any other programmer oriented language that can define data? I had to do it recently and XML simply sucks.

The XSDL or WSDL are simply put monstruosities and reading or editing them with anything but advanced XML tools is simply a waste of development time and nerves. BeautyIsOurBusiness and that's why languages used in this business has to adhere to some minimal standard of elegance and beauty, but for defining metadata (schemas, web services, etc) the XML based proposals are butt-ugly.

This is more than just a personal observations. Important books on XML like DataOnTheWeb?, as well as theoretical articles avoid using the XML-based Xml Schema Definition Language for defining schemas, replacing it with either a home grown language or Relax NG, precisely because XSDL is unreadable in a book or any other setting where you try to communicate to humans, it simply contains too much garbage that detracts from the content.

Amen. I've tried to make this point on several pages here. Textbook examples of XML give false hope of readability. Real world usage quickly leads to custom data viewers and editors, removing the need for all those text identifiers in each message.

How about RssFeeds ?

What about them? [DeleteWhenCooked]

Experts were discussing relative merits too. See, for example, "What do Soap and XML-RPC buy you" thread at http://lists.xml.org/archives/xml-dev/200104/threads.html#00534

[Yeah, we're back to this line of argument again. <sigh> XML is a transport mechanism, the facility to get the data from source to sink without worrying about how the data is represented on the network or if the byte order is correct or any of that drivel. It is nice that people can read the format also, but that doesn't insure that people without specific domain knowledge will be able to accurately interpret XML data representations.

However, anydamnbody who really knows what a specific XML element is will be able to read its value and tell you what's going on by looking at the XML. Ditto for any applications that have been put in the know. XML is a powerful tool for conveying this data without worrying about secondary transmission effects.]


Note: I (MartySchrader) removed the CategoryRant label from this page because the statements here are based on fact, no just personal opinion. Let's please be careful how we apply potentially troublesome labels to Wiki pages.


See: MythOfMetadata

CategoryXml


EditText of this page (last edited October 24, 2013) or FindPage with title or text search