Group Related Data

Note: This is not a "pattern" as much as it is a "refactoring". I've used the common form for pattern descriptions because I'm not sure how to lay out a refactoring description.


Problem: An XmlLanguage (or similar) document format is too hard to process.

Context: Even though each datum is well-isolated (see IsolateEachDatum), the data is still unwieldy. Cumbersome and unnatural parsing must be performed and non-obvious conventions must be applied in order to make heads or tails of the data structure. In particular, there are flat lists that contain data with implicit internal structure.

Solution: Make sure there are enclosures for all groups of logically connected data. However, avoid multiple layers of unnecessary wrapping; once your data is well-structured, leave it at that. The deeper you make it from that point on, the more annoying it will be to work with -- at little to no benefit.

It may appear as though adding structure would just mean adding bloat/noise/weight or whatever. Nevertheless, if in doubt, go with more structure. It is easy to transform well-structured data to poorly-structured data; going the other way is harder. If still in doubt, maybe you'll have to take other factors into account, such as expected amount of hand-editing and the effect on that, and whether or not size is absolutely critical (if you're using XML, now is a good time to reconsider).

Resulting Context: This is primarily from an XML point of view, but most of this applies to, e.g., EssExpressions as well:

Known Uses: This problem is very common and often occurs when people inexperienced with processing data is responsible for designing data formats. It is relevant to XML as well as S-expressions, although it's more common in the XML world -- if only because the XML world is less mature.

Author: DanielBrockman 2004-01-20


We start out with an abstract example that quickly get to the core of the issue. EssExpression examples are in italics.


Polygon example

(See IsolateEachDatum for beginning of story.)

Bad -- hard to process (does not group related data):

 <polygon>
   <coordinate>1</coordinate>
   <coordinate>6</coordinate>
   <coordinate>3</coordinate>
   <coordinate>5</coordinate>
   <coordinate>4</coordinate>
   <coordinate>6</coordinate>
 </polygon>

(polygon 1 6 3 5 4 6)
Good -- easily processable (provides good structure by grouping related data):
 <polygon>
   <vertex x="1" y="6" />
   <vertex x="3" y="5" />
   <vertex x="4" y="6" />
 </polygon>

(polygon (1 6) (3 5) (4 6))

Dictionary example

Bad -- hard to process (has implicit structure in an element list):

 <dictionary>
   <term>ROFL</term>
   <definition>rolling on the floor laughing</definition>
   <term>LMAO</term>
   <definition>laughing my arms off</definition>
 </dictionary>

(dictionary "ROFL" "rolling on the floor laughing" "LMAO" "laughing my arms off")
Good -- easy to process (has just the right amount of structure):
 <dictionary>
   <entry>
     <term>ROFL</term>
     <definition>rolling on the floor laughing</definition>
   </entry>
   <entry>
     <term>LMAO</term>
     <definition>laughing my arms off</definition>
   </entry>
 </dictionary>

(dictionary ("ROFL" "rolling on the floor laughing") ("LMAO" "laughing my ass off"))
Also good (effectively equivalent):
 <dictionary>
   <entry term="ROFL">rolling on the floor laughing</entry>
   <entry term="LMAO">laughing my arms off</entry>
 </dictionary>

(no direct S-expression equivalent)
Wrong. Here it is:
 (dictionary
   (entry :term "ROFL" "rolling on the floor laughing")
   (entry :term "LMAO" "laughing my ass off"))
Right, but I think that one doesn't really make sense. First, what purpose do the seemingly dead-weight copies of "entry" serve? Second, why use ":term x" instead of the more obvious "(term x)"? Third, why mention "term" at all? It seems all those things are just there to create a literal translation of a piece of non-ideal XML, and that's kind of silly. I mean, in that case, you might as well emulate the end tags as well, yielding
 (dictionary
   (entry :term "ROFL" "rolling on the floor laughing" /entry)
   (entry :term "LMAO" "laughing my ass off" /entry) /dictionary)
Mind you, I certainly didn't intend this to be yet another XML vs. S-expressions page; rather, I wanted to show how to refactor XML, and while at it, also roughly corresponding S-expressions. However, I can see that the "no S-expression equivalent" looks kind of provocative. Maybe we should instead stick the alternative XML solutions right next to each other and show the single sane S-expression solution below them? Thanks for your input.


Real-world examples follow.


Apple's new plist format

(See IsolateEachDatum for the beginning of the story.)

Bad -- hard to process (uses an ungodly mix of both too much and too little structure):

 <plist version="1.0">
   <dict>
     <key>AnimateSnapToGrid</key>
     <true />
     <key>EmptyTrashProgressWindowLocation</key>
     <point x="79" y="44" />
     <key>FileViewer.LastWindowLocation</key>
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </dict>
 </plist>
Good:
 <plist version="1.0">
   <property>
     <key>AnimateSnapToGrid</key>
     <true />
   </property>
   <property>
     <key>EmptyTrashProgressWindowLocation</key>
     <point x="79" y="44" />
   </property>
   <property>
     <key>FileViewer.LastWindowLocation</key>
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </property>
 </plist>
Also good (but note that this variant would not have been an option had I chosen to go with structured keys):
 <plist version="1.0">
   <property key="AnimateSnapToGrid">
     <true />
   </property>
   <property key="EmptyTrashProgressWindowLocation">
     <point x="79" y="44" />
   </property>
   <property key="FileViewer.LastWindowLocation">
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </property>
 </plist>
How about:
 <plist version="1.0">
   <property key="AnimateSnapToGrid">
     <boolean value="true" />
   </property>
   <property key="EmptyTrashProgressWindowLocation">
     <point x="79" y="44" />
   </property>
   <property key="FileViewer.LastWindowLocation">
     <rectangle x1="228" y1="140" x2="1091" y2="826" />
   </property>
 </plist>
Here, the boolean validation is according to type, type being 'boolean' element (meaning a property may contain 'boolean' element and not 'true' or 'false' element).

Yes, you're absolutely right. I like this better too. In your version, ElementNamesAreTypeNames?, which strikes me as highly intuitive.


See also: GroupRelatedInformation


CategoryXml, CategoryRefactoring, CategoryInfoPackaging


EditText of this page (last edited January 30, 2005) or FindPage with title or text search