Group Related Data

Note: This is not a "pattern" as much as it is a "refactoring". I've used the common form for pattern descriptions because I'm not sure how to lay out a refactoring description.

Problem: An XmlLanguage (or similar) document format is too hard to process.

Context: Even though each datum is well-isolated (see IsolateEachDatum), the data is still unwieldy. Cumbersome and unnatural parsing must be performed and non-obvious conventions must be applied in order to make heads or tails of the data structure. In particular, there are flat lists that contain data with implicit internal structure.

Solution: Make sure there are enclosures for all groups of logically connected data. However, avoid multiple layers of unnecessary wrapping; once your data is well-structured, leave it at that. The deeper you make it from that point on, the more annoying it will be to work with -- at little to no benefit.

It may appear as though adding structure would just mean adding bloat/noise/weight or whatever. Nevertheless, if in doubt, go with more structure. It is easy to transform well-structured data to poorly-structured data; going the other way is harder. If still in doubt, maybe you'll have to take other factors into account, such as expected amount of hand-editing and the effect on that, and whether or not size is absolutely critical (if you're using XML, now is a good time to reconsider).

Resulting Context: This is primarily from an XML point of view, but most of this applies to, e.g., EssExpressions as well:

An XSLT stylesheet can now straightforwardly match all elements of a certain type and do whatever it needs to with those and their children. This is the way XSLT was designed to work. With poorly-enclosed data, the stylesheet was required to essentially do some nasty tree/list parsing to figure out where the data it needed was located.
Writing readable and modular schemas to validate the data is now much easier.
The data is much easier to read, since quickly scanning the left edge of the markup now reveals its overall structure in a glance.

Known Uses: This problem is very common and often occurs when people inexperienced with processing data is responsible for designing data formats. It is relevant to XML as well as S-expressions, although it's more common in the XML world -- if only because the XML world is less mature.

Author: DanielBrockman 2004-01-20

We start out with an abstract example that quickly get to the core of the issue. EssExpression examples are in italics.

Polygon example

(See IsolateEachDatum for beginning of story.)

Bad -- hard to process (does not group related data):

 <polygon>
   <coordinate>1</coordinate>
   <coordinate>6</coordinate>
   <coordinate>3</coordinate>
   <coordinate>5</coordinate>
   <coordinate>4</coordinate>
   <coordinate>6</coordinate>
 </polygon>

 (polygon 1 6 3 5 4 6)

Good -- easily processable (provides good structure by grouping related data):

 <polygon>
   <vertex x="1" y="6" />
   <vertex x="3" y="5" />
   <vertex x="4" y="6" />
 </polygon>

 (polygon (1 6) (3 5) (4 6))

Dictionary example

Bad -- hard to process (has implicit structure in an element list):

 <dictionary>
   <term>ROFL</term>
   <definition>rolling on the floor laughing</definition>
   <term>LMAO</term>
   <definition>laughing my arms off</definition>
 </dictionary>

 (dictionary "ROFL" "rolling on the floor laughing" "LMAO" "laughing my arms off")

Good -- easy to process (has just the right amount of structure):

 <dictionary>
   <entry>
     <term>ROFL</term>
     <definition>rolling on the floor laughing</definition>
   </entry>
   <entry>
     <term>LMAO</term>
     <definition>laughing my arms off</definition>
   </entry>
 </dictionary>

 (dictionary
   ("ROFL" "rolling on the floor laughing")
   ("LMAO" "laughing my ass off"))

Also good (effectively equivalent):

 <dictionary>
   <entry term="ROFL">rolling on the floor laughing</entry>
   <entry term="LMAO">laughing my arms off</entry>
 </dictionary>

 (no direct S-expression equivalent)

Wrong. Here it is:

 (dictionary
   (entry :term "ROFL" "rolling on the floor laughing")
   (entry :term "LMAO" "laughing my ass off"))

Right, but I think that one doesn't really make sense. First, what purpose do the seemingly dead-weight copies of "entry" serve? Second, why use ":term x" instead of the more obvious "(term x)"? Third, why mention "term" at all? It seems all those things are just there to create a literal translation of a piece of non-ideal XML, and that's kind of silly. I mean, in that case, you might as well emulate the end tags as well, yielding

 (dictionary
   (entry :term "ROFL" "rolling on the floor laughing" /entry)
   (entry :term "LMAO" "laughing my ass off" /entry) /dictionary)

Mind you, I certainly didn't intend this to be yet another XML vs. S-expressions page; rather, I wanted to show how to refactor XML, and while at it, also roughly corresponding S-expressions. However, I can see that the "no S-expression equivalent" looks kind of provocative. Maybe we should instead stick the alternative XML solutions right next to each other and show the single sane S-expression solution below them? Thanks for your input.

Real-world examples follow.

Apple's new plist format

(See IsolateEachDatum for the beginning of the story.)

Bad -- hard to process (uses an ungodly mix of both too much and too little structure):

 <plist version="1.0">
   <dict>
     <key>AnimateSnapToGrid</key>
     <true />
     <key>EmptyTrashProgressWindowLocation</key>
     <point x="79" y="44" />
     <key>FileViewer.LastWindowLocation</key>
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </dict>
 </plist>

Good:

 <plist version="1.0">
   <property>
     <key>AnimateSnapToGrid</key>
     <true />
   </property>
   <property>
     <key>EmptyTrashProgressWindowLocation</key>
     <point x="79" y="44" />
   </property>
   <property>
     <key>FileViewer.LastWindowLocation</key>
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </property>
 </plist>

Also good (but note that this variant would not have been an option had I chosen to go with structured keys):

 <plist version="1.0">
   <property key="AnimateSnapToGrid">
     <true />
   </property>
   <property key="EmptyTrashProgressWindowLocation">
     <point x="79" y="44" />
   </property>
   <property key="FileViewer.LastWindowLocation">
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </property>
 </plist>

How about:

 <plist version="1.0">
   <property key="AnimateSnapToGrid">
     <boolean value="true" />
   </property>
   <property key="EmptyTrashProgressWindowLocation">
     <point x="79" y="44" />
   </property>
   <property key="FileViewer.LastWindowLocation">
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </property>
 </plist>

Here, the boolean validation is according to type, type being 'boolean' element (meaning a property may contain 'boolean' element and not 'true' or 'false' element).

Yes, you're absolutely right. I like this better too. In your version, ElementNamesAreTypeNames?, which strikes me as highly intuitive.