Note: This is not a "pattern" as much as it is a "refactoring". I've used the common form for pattern descriptions because I'm not sure how to lay out a refactoring description.
Problem: An XmlLanguage (or similar) document format is too hard to process.
Context: Even though each datum is well-isolated (see IsolateEachDatum), the data is still unwieldy. Cumbersome and unnatural parsing must be performed and non-obvious conventions must be applied in order to make heads or tails of the data structure. In particular, there are flat lists that contain data with implicit internal structure.
Solution: Make sure there are enclosures for all groups of logically connected data. However, avoid multiple layers of unnecessary wrapping; once your data is well-structured, leave it at that. The deeper you make it from that point on, the more annoying it will be to work with -- at little to no benefit.
It may appear as though adding structure would just mean adding bloat/noise/weight or whatever. Nevertheless, if in doubt, go with more structure. It is easy to transform well-structured data to poorly-structured data; going the other way is harder. If still in doubt, maybe you'll have to take other factors into account, such as expected amount of hand-editing and the effect on that, and whether or not size is absolutely critical (if you're using XML, now is a good time to reconsider).
Resulting Context: This is primarily from an XML point of view, but most of this applies to, e.g., EssExpressions as well:
Author: DanielBrockman 2004-01-20
We start out with an abstract example that quickly get to the core of the issue. EssExpression examples are in italics.
Polygon example
(See IsolateEachDatum for beginning of story.)
Bad -- hard to process (does not group related data):
<polygon> <coordinate>1</coordinate> <coordinate>6</coordinate> <coordinate>3</coordinate> <coordinate>5</coordinate> <coordinate>4</coordinate> <coordinate>6</coordinate> </polygon> (polygon 1 6 3 5 4 6)Good -- easily processable (provides good structure by grouping related data):
<polygon> <vertex x="1" y="6" /> <vertex x="3" y="5" /> <vertex x="4" y="6" /> </polygon> (polygon (1 6) (3 5) (4 6))Dictionary example
Bad -- hard to process (has implicit structure in an element list):
<dictionary> <term>ROFL</term> <definition>rolling on the floor laughing</definition> <term>LMAO</term> <definition>laughing my arms off</definition> </dictionary> (dictionary "ROFL" "rolling on the floor laughing" "LMAO" "laughing my arms off")Good -- easy to process (has just the right amount of structure):
<dictionary> <entry> <term>ROFL</term> <definition>rolling on the floor laughing</definition> </entry> <entry> <term>LMAO</term> <definition>laughing my arms off</definition> </entry> </dictionary> (dictionary ("ROFL" "rolling on the floor laughing") ("LMAO" "laughing my ass off"))Also good (effectively equivalent):
<dictionary> <entry term="ROFL">rolling on the floor laughing</entry> <entry term="LMAO">laughing my arms off</entry> </dictionary> (no direct S-expression equivalent)Wrong. Here it is:
(dictionary (entry :term "ROFL" "rolling on the floor laughing") (entry :term "LMAO" "laughing my ass off"))Right, but I think that one doesn't really make sense. First, what purpose do the seemingly dead-weight copies of "entry" serve? Second, why use ":term x" instead of the more obvious "(term x)"? Third, why mention "term" at all? It seems all those things are just there to create a literal translation of a piece of non-ideal XML, and that's kind of silly. I mean, in that case, you might as well emulate the end tags as well, yielding
(dictionary (entry :term "ROFL" "rolling on the floor laughing" /entry) (entry :term "LMAO" "laughing my ass off" /entry) /dictionary)Mind you, I certainly didn't intend this to be yet another XML vs. S-expressions page; rather, I wanted to show how to refactor XML, and while at it, also roughly corresponding S-expressions. However, I can see that the "no S-expression equivalent" looks kind of provocative. Maybe we should instead stick the alternative XML solutions right next to each other and show the single sane S-expression solution below them? Thanks for your input.
Real-world examples follow.
Apple's new plist format
(See IsolateEachDatum for the beginning of the story.)
Bad -- hard to process (uses an ungodly mix of both too much and too little structure):
<plist version="1.0"> <dict> <key>AnimateSnapToGrid</key> <true /> <key>EmptyTrashProgressWindowLocation</key> <point x="79" y="44" /> <key>FileViewer.LastWindowLocation</key> <rectangle x1="228" y1="140" x2="1091" y2="826" /> </dict> </plist>Good:
<plist version="1.0"> <property> <key>AnimateSnapToGrid</key> <true /> </property> <property> <key>EmptyTrashProgressWindowLocation</key> <point x="79" y="44" /> </property> <property> <key>FileViewer.LastWindowLocation</key> <rectangle x1="228" y1="140" x2="1091" y2="826" /> </property> </plist>Also good (but note that this variant would not have been an option had I chosen to go with structured keys):
<plist version="1.0"> <property key="AnimateSnapToGrid"> <true /> </property> <property key="EmptyTrashProgressWindowLocation"> <point x="79" y="44" /> </property> <property key="FileViewer.LastWindowLocation"> <rectangle x1="228" y1="140" x2="1091" y2="826" /> </property> </plist>How about:
<plist version="1.0"> <property key="AnimateSnapToGrid"> <boolean value="true" /> </property> <property key="EmptyTrashProgressWindowLocation"> <point x="79" y="44" /> </property> <property key="FileViewer.LastWindowLocation"> <rectangle x1="228" y1="140" x2="1091" y2="826" /> </property> </plist>Here, the boolean validation is according to type, type being 'boolean' element (meaning a property may contain 'boolean' element and not 'true' or 'false' element).
Yes, you're absolutely right. I like this better too. In your version, ElementNamesAreTypeNames?, which strikes me as highly intuitive.
See also: GroupRelatedInformation