Fear Of Hierarchies

In LimitsOfHierarchies, it is well-established that there are many problems for which hierarchical data arrangments (and the canonical hierarchical data structure, the tree) are inappropriate. However, there are many other problems for which a hierarchy is appropriate. Some people argue that because hierarchies are inappropriate in some contexts, they should be avoided in all contexts. This is, in my opinion, an AntiPattern.

Some things that are naturally modeled by hierarchies. Please don't put tautological things on this list (for example, "XML documents", "single inheritance", or "HierarchicalFilesystem?s"...we don't want to look silly. Also don't list the biological classification system; that's an arbitrarily-imposed hierarchy--see LimitsOfHierarchiesInBiology).

The advantages of hierarchies (since much time has been given to their limits) include:

           use tableX
           copy to myFile.txt type delimited

{please clarify what is being compared for "easier"}

Please note: It is not argued here that hierarchies should be used for everything; it is simply noted that they are an appropriate choice for some tasks. And it is quite readily acknowleged that hierarchies are used in many places where they are clearly inappropriate.


Half-ass Trees

By the way, my "fear of trees" is not due to lack of exposure. I had a professor in college who was a "tree evanglist". He sold us all on the idea of trees and originally I did not question it. It looked great on paper. It was dealing with real-world issues such as org-charts, product classifications, variation management, etc. that made me realise they were too simplistic a structure, turning me to focus on sets instead.

If you want to sell me back to trees, you need to find better ways to deal with the exceptions. Often things may fall 85% into a tree pattern, but dealing with the 15% non-tree-fits is a mess. Sets seem to deal far better with that 15%. I will agree that sets are not as "clean" for the 85%, but make up for it with the rest.

I don't see a lot of tips for dealing with the problems identified in LimitsOfHierarchies and RealWorldHierarchies. The advice appears to be "buck up and live with the problems of trees". I could counter and say, "buck up and grow comfortable with sets". -- top

FalseDichotomy. We don't have to pick one over the other. We can use any data structure we like. No one wants to "sell" you trees. If you don't need them, don't use them. -- EricHodges

Huh? The title of this topic is not "fear of using any data structure we like".

Above you write as if using sets precludes using trees. That's a false dichotomy. You can make tree views of sets, set views of trees, or use them independently. Just because you use sets 15% of the time doesn't mean you can't use trees the other 85% of the time. -- EH

In theory almost any graph can be viewed in a tree-wise way. However, I haven't seen a way to easily flip back and forth between tree-view and set-view. Until our tools allow easy flipping, I'd suggest using sets instead of trees if it looks like the domain won't be so tree-friendly over time. And if it was easy to flip, then its mostly a moot issue since the "penalty" for picking the wrong one would be small. --top


A lot of this overlaps with RealWorldHierarchies.


Another technique is to have a "hasBeenVisited" column to mark.

I really dislike this one (I've seen similar things in other contexts). It confuses the static (the data) with the dynamic (the executing program). What happens if you want to do two traversals at once?

Keep a separate (local) list of visited primary keys. Whether marking or list-keeping is needed or not depends on the "traversal" requirements. I don't think "traversal" has a precise definition. Does it only cover visiting every node once, or does it cover *how* they are visited? And if so, what is the pattern it specifies?

I didn't mean I couldn't think of an alternative algorithm :) And exactly because to depends on the traversal requirements is why marking in the data structure is a bad idea. It mixes structural ideas with processing ideas.

I am still not clear on why you are bothered by the alleged "mixing"? What is an example problem it can create that cannot be fixed with altneratives?

It's just an AntiPattern, that's all.

The reason for the hierarachy is normally that it models some real/long lasting property of whatever you're modelling. Whether or not storing a "visited" boolean in that data structure will or won't cause problems is an artifact merely of the current processing you're trying to do, which is much more transitory. If as a developer you note that the data structure implementor has implemented such a flag, you may, or may not get into problems if you use it. In this way, it's rather like a global variable.

So I think using such a flag is bad practice (and avoiding such things is one reason for the VisitorPattern)

BertrandMeyer's assertion that list data structures should contain cursors is another example of the AntiPattern

One can create a separate "links" table to avoid modifying the domain table. I'm just not one to create tons of "extra" tables unless there is a known or likely need. I suppose some will disagree, taking us to into a FearOfAddingTables debate.


Another thing to consider is the possible need to traverse a tree in non-tree ways. Different users have different needs.


EditText of this page (last edited July 8, 2008) or FindPage with title or text search