Real World Hierarchies

From NavigationalDatabase:

"Some solution domains have a very strong hierarchical structure as an intrinsic part of them."


Isn't this moving goal-posts? The original request was for "real-world" hierarchies. Several of these having been provided, it seems that the criteria are changing to something like "naturally occurring (whatever that's supposed to mean) single, definitive, pure tree structures that apply exactly to all cases under discussion and are the only view of that case required by all persons interested in it."

There seems to be some confusion about whether we are studying hierarchies here or modeling them. There are two semi-orthogonal issues here: whether real-world things are "tree enough", and then what the best computer model for them is. When designing a system, the question needs to be asked: is the domain inharently tree-shaped (or how tree-ish is it), and if not, are trees being used as a UsefulLie anyhow because of their simplicity? And further, how does one deal with the non-tree parts of domain "near" trees if modeled as trees? For example, if the domain is 95% tree, then how does one work the 5% into the model without making too big of a mess? (Or perhaps decide to skip trees as the modeling technique.)

[reworded this to make it more diplomatic]


Postal addresses

You mean like country->state->city->street->building->room ?

Perhaps. But, it does tend to have a lot of variation below the building level. Plus government institutions sometimes have odd setups below the city level.

Oh, there's a great deal of variation in the detail of the tree in all sorts of cases. Nonetheless, I've never seen a non-hierarchical postal address anywhere I've been in Europe or North America. Nowhere is it claimed that the same one canonical hierarchy applies to all postal addresses.

ArePhysicalPostalAddressesArchaic

Another issue to consider is that any given point on the map may have multiple ways to locate it. For example, political districts often correspond poorly with Zip-codes (US) or counties due to slick gerrymandering. Thus, the political map may not resemble the postal map. Each domain thus has its own independent or semi-independent hierarchy.


IP domain names (See RFC 1591)

"In the Domain Name System (DNS) naming of computers there is a hierarchy of names." ... "As an example of a country domain, the US domain provides for the registration of all kinds of entities in the United States on the basis of political geography, that is, a hierarchy of <entity-name>.<locality>.<state-code>.US."

RFC 1591 references RFC 1480, which provides a detailed exposition of the domain name tree in the US, parts of which have five levels, eg. <school>.<school-district>.K12.<state>.US

Similar to postal address issues. If you mean that they are both perfectly valid examples of hierarchical structures in the world, then yes.

Actually, it is not really a tree by itself. Those can all be considered independent attributes. For example, you might be a mail-order salesperson for kindergarten products. To find all kindergarten teachers to mail spam to, you might only want to look at the "grade" attribute. You are not thinking "tree" when doing that. One can think of the URL as a kind of tuple. However, the hierarchical view is based on the political division of the schools, and thus moves the issue to human organizational ("org chart") sub-topic.

So a tree is only a tree if every mail order salesperson thinks "tree" when they search it? Come on, be reasonable. It's a tree by definition. How it is searched doesn't change that.

By definition? Trees are mostly in one's head.

Your head isn't that big, grasshopper.

They are often used similar to how file hierarchies (folders) are used, and I find hierarchical file systems highly flawed as a categorization system for non-trivial stuff. Generally they are used because people readily understand them, not because they are the ideal classification system. It is the leftish side of the easy-to-learn-versus-power spectrum. See FileSystemAlternatives.

I would like to point out that Yahoo, if I remember correctly, tried to make a web search engine by building a hierarchy of categories. The tree grew too long and messy, and Google ended up kicking its butt in the marketplace using its free-association approach.

The example here, DNS names, has been evaded by examples of a subset of DNS names where you have attribute information on each field. Like it or not, many IT systems implement addressing using trees (IP addresses, domain names) - often for valid efficiency-of-implementation reasons. Like it or not, your software must be able to interface with these trees, and must be aware of the concept of the tree. The validity of the tree as a model for whatever they choose to model is irrelevant - you may as well argue that the sky should not be blue.

You will not have an attribute name for each field. When you have foo.bar.example.com, you don't know what entry "Foo" and "bar" represent.... you just know that "bar" is owned by "example.com" and "foo" is owned by "bar". That's it. And even if you do remap this tree (and it is a tree) into attribute names, you run into problems with context, just like the popular "Fire" problem with DynamicTyping. So you assign every attribute a "scope" attribute, and in turn the "scope" attribute it's own "scope" attribute, so you can clarify the difference between a military tank and a septic tank... but then your scopes become a BigMessyGraph?, which makes for an implementation nightmare since a scope can have no "root scope" - if it did, it'd be a tree (or at least a DirectedGraph).

In the end, almost everything is a BigMessyGraph? if you look at it hard enough... but sensibly traversing the BigMessyGraph? is nearly impossible, so we whittle them down into structures that are conceptually manageable. Sometimes that's a DirectedGraph, and sometimes that's a tree. And implementing those most modern databases sucks, compared to how easy it is in general-purpose programming languages.


genealogies

Marriages and chromosome exchange are un-tree.

So what? Do you know what a genealogy is? It's the thing that traces the decent of an individual. It's a family tree. I'll type that again, family tree. The clue is in the word tree.

Well, I am claiming that "tree" is a bit misleading because it is not a pure hierarchy.

Wrong again. You are the child of two genetic parents, who are each the child of two genetic parents, who are each the child of... etc., etc. Until such time as human cloning becomes a practicality, your heredity is strongly hierarchical. Sure, in some isolated communities the same individual can appear multiple times in the tree but this is 1) rare, 2) besides the point.

Pure trees don't have 2 parents.

How hard can this be? Here is a partial genealogy, in abstract:

  .                           person
  .             ................/\................
  .           mother                           father 
  .         /        \                       /        \       <- [wiki doesn't like terminal backslashes]
  .   maternal      maternal           paternal      paternal
  .  grandmother   grandfather        grandmother   grandfather
See the tree?

Okay. I see what you mean. But beyond a few generations there will likely be common ancestors. IIRC, that is called a "directed graph". (It is impossible for a (biological) family tree to contain a cycle; that would imply that an individual is simultaneously the ancestor and descendent of another.)

You're joking, right? I don't know about your family, but I only have to go back a few generations to find ancestors who lived so far apart they didn't even speak languages from the same language group. (Oh, and by the way, language families form a hierarchy, too). Sure, in some ultimate sense we all have common ancestors, but,..., well you're clearly just trolling now, so I give up.

The branch width of such trees are far wider than the total number of languages. Thus, obviously there are link-backs. Populations cycle back into one another. That is a fact of biology. There is almost certainly somebody on your mom's side that is related to somebody on your dad's side. How far back one has to go to find it on average, I don't know. Even across species there is sometimes DNA exchange through infectious agents. See LimitsOfHierarchiesInBiology.

BTW, perhaps "treeness" is a continuous measurement rather than a Boolean declaration. Although family "trees" have a high tree-ness factor, it is not 100%. Perhaps the metric can be based on the number of cycles or cross-overs divided by the number of nodes, or such like.

I fully agree that there is some degree of "treeness" in almost all of these, but it is far from being perfect trees or the usage patterns (queries) are not always trees.

Examples with perfect "treeness" were not requested. Neither was the request for structures for which the queries had tree structure (whatever that means). It's time to raise the TrollFlag on you.

{You seem to view this as a contest. I view this topic as a catalog of example things in the real world that have some noticeable tree-ness to them, and a *study* of how "tree-ish" they are. The "tree query" comment is about how often the users of info from the alleged tree structures care about the tree-ness. For example, "show me all yellow nodes" does not depend on tree-ness. "Show me all yellow nodes that have green parents" does.}

I think most will agree that the trees mentioned are imperfect in some ways. The disagreement is probably whether to use a tree as an internal (computer) model despite such flaws. In other words, the big question is: is a formal tree structure a UsefulLie (or useful-enough lie). Do we use sets and graphs to handle the "deviations", or do we somehow manage to live with them in a tree structure with some cross-branch links and/or duplication? In other words, tree-proponents will suggest that the benefits of tree structures outweigh the problems of a few non-tree cross-links. I lean toward the first (sets, graphs). Perhaps if I saw specific examples of how to deal comfortably with the deviations when using a tree structure, I might warm up to it. The larger something is (more nodes), the less tree-ish it tends to be in my observation. Thus, trees have a bigger penalty under scale-ups.

I see only one contributor who agrees that the trees mentioned are imperfect in some way.

Are you saying that there are no cross-links in family trees?

No.

Then they are imperfect. Case closed, it appears.

[Absolutely everything in the real world is imperfect, not just trees. So what?]

On the larger scale, genealogies appear to be a DAG - directed acyclic graph, if I am not mistaking; not a tree (barring some extreme incest).

SouthernCulturalAssumption?? ;-)


Library (Book) Classification Schemes

The DeweyDecimalSystem seems to be falling out of favor as more modern techniques are introduced. It does not handle orthogonal categories very well (see examples below).

All of which is true, but none of which addresses the issue at hand.

Well, I am suggesting that it was a rather artificial hierarchy because it had to cater to physical constraints of the time. IOW, it is not an "intrinsic" tree.

It's a completely artificial hierarchy (by the way, what have "physical constraints" got to do with it?), it is also a hierarchy widely met with in the real world.

(Mightn't "physical" constraints refer to the fact that single copy of a book can't be placed in multiple locations at the same time?)

Until now, because something better is replacing it for most searching.

Well, that is because we as humans still do book browsing at the physical level. Tree taxonomies are indeed a UsefulLie at this level. You do agree that tree taxonomies are not perfect, correct? For example, a Bioinformatics book could go under both the computer section and the biology section but we cannot put it under BOTH without duplication or cross-links, which violates pure tree rules. Oreilly's Bioinformatics even contains an introduction to UNIX file commands. Similarly, a book titled "The Psychology of Computer Programming" could go under both the computer section and the psychology section. "The Business of Bioinformatics" could be put under "Business", "Computers", and "Biology". But Dewey can only put it under one major section; and using branches for minor sections can risk a CartesianJoin (combinatorial explosion).

I suppose another way of saying this is that trees are relatively easily linearized. Thus, library taxonomy trees can be "flattened" into a numerical system that can be used to shelf books. It is tougher to do this with sets. Of course, this does not mean that sets are bad, only that they lack some features that trees have, such as easy serialization. It can still be done with sets fairly well if we also store weighting factors. It might even return the same answer that a tree does. For example, the Biz Bioinfo book above may have the highest weight on the "Biology" factor. If we want to browse "Biology" books, it would likely appear fairly high in that category. With Dewey, the highest factor, "Biology" in our example, becomes the primary factor. Weighted sets do that same thing by picking the highest factor's weight when we ask our query system to put everything in one and only one "main" category.

The big issue is that the real world is limited, but SoftwareGivesUsGodLikePowers. We can index and browse in a gajillion dimensions inside the computer. We don't have to constrain our classification system to fit physical bookshelves better. It will be interesting to see if DeweyDecimalSystem and similar coding systmes eventually dies off now that physical libraries are also dying off. A list of key-words may serve our purposes better. However, the concept of "next to" gets more difficult to compute, but may also lead to better results. -- top


Legal Documents

I have no experience in this domain. Any expert wish to comment? Don't they have a lot of cross-references?


algebraic expressions

I often like to break things down into named units:

  a = foo(x,y)
  b = bar(a,z)
  c = blah(a,b)
Still parsable as a tree, perhaps, but less so conceptually.

Nope. All you've done is taken a tree and explicitly labled some of the nodes.

And indirect recursion? Getting nesting-happy is one of the reasons behind SqlFlaws, I would note.

Related: AbstractSyntaxTree


Personnel and Institution Organization Charts

Examples: the org chart of almost every corporation, the executive branches ("branches", what a give-away) of every national government, every military organization, the Roman Catholic Church, etc.

I once tried to use a tree for budget summaries at a company of about 1600 people. It did not work. I ended up using sets. Another time it turned out that personell management wanted a different structure than operations/production managers. There was no single "right" hierarchy. And, some managers/users wanted it tweaked yet differently just for their own report. I considered making it a data-entry item for them so that they could maintain their own hierarchy using the lowest-level codes. However, that was vetoed. We just had to live with multiple trees or hacky trees. Another time they wanted certain kinds of items to add up ("roll up") to a mid-level category, but others kinds not to add up (ignored). The difference was not organization-related, but is roughly comparable to "product type". Thus, the low-level nodes had the numbers, but they didn't flow to higher branches. A footnote was included on reports to explain this. (I didn't make up those biz rules and couldn't change them. PoliticsWithClassification)

Good for you. How does this affect whether or not the org-chart is a tree?

The politics of situations tend to bust trees.

Yep. That doesn't stop the trees being there, though.

Semi-tree? It's kind of the EightyTwentyRule for trees: They are "mostly trees", but that's not good enough for computer algorithms. Either we have to abandon trees (using perhaps sets instead), or deal with all kinds of exceptions to the rules (deviations), making for fudgey-looking conventions and/or code.

Plus, there is the "multi-boss" matrix organization common in engineering organizations, where you have project-specific bosses and discipline specialty bosses.'' See MatrixManagement

Yes there is. The existence of non-tree organization structures in no way demonstrates the non-existence of tree ones, however.

(Re the first sentence in this section: That argument can just as easily be turned back on your point: "I tried to find out everybody Bob is responsible for, but I didn't have that set already, so I had to generate it from the tree." A tree may not be the be-all, end-all data structure, but sets have problems, too: You can't keep all possible sets like that around, and the moment you start talking about generating them, you're going back to the tree structure. Likely as not, your sets are implemented as a type of tree internally anyhow; trees have an obvious data representation in our modern machines but sets generally must be implemented on top of something else, as there is no obvious direct implementation of them. You seem to be arguing as if there are Sets and there are Trees and ne'er the twain shall meet, but there is a very smooth continuum between them in the sense of 'possible data structures'. Each has its own advantages and disadvantages, and it's virtually impossible to draw a line delimiting the two. Even if you think that you'll recurse on the sets and take the union of the sets of people Bob is the direct boss of, then the people they are the direct boss of, and so on until you're done, you're doing a tree operation with children implemented as sets.

IMHO, one of the signs of programming maturity is recognizing that the Standard Data Types (tree, set, list, etc.) are actually almost entirely orthogonal to one another, sometimes even to themselves, and can be mixed and matched almost arbitrarily. I've had trees that were also indexed with a list of all nodes, heaps that were also hash tables, even multi-trees where two different hierarchy dimensions were represented, and I mention this explicitly because I think this is perfectly normal, not as bragging that I'm doing anything outstanding. Part of the reason you can do anything with Sets is that Sets, and more precisely Sets of Sets, used in aggregate can implement almost any organization-based data structure. Thus, showing that Sets can do everything is not interesting. The question is, are they the best for everything? And the answer to that is clearly "no". If you want to ask for everybody who reports to Bob, directly or indirectly, you will use the tree; said tree may be implemented as a set but it's still a tree.)

I am not sure what your point is. As a RelationalWeenie, I see it trivial to represent such information in a database. The most general form is probably some kind of many-to-many table, but not the only.

(I'm saying that you've got a tree in your database, and like I said probably a database in your tree, so this "either tables or trees must be better then the other" (with a strongly implied "in all cases" on your part!) line is kinda bunk. Tables are flexible "out of the box" but just as tables are more then just rows of data once you add relational algebra to it, there's more to "trees" than what a graph theorist would call a "tree", and there are many things more naturally thought of as tree manipulation, such as manipulating an HTML document in arbitrary ways with JavaScript, then as tables.

I guess you could say that at the level of abstractness this entire page seems to be arguing at, that tables and trees are isomorphic in terms of capabilities. So constant assertions that "I can represent that relationally!" do not impress me. I can represent your database as multiple interleaving trees, and it quite likely is in reality. Unless you're willing to argue a much more specific case, no meaningful discussion can be had. Handling imperfect-but-real hierarchies has almost identical implications, whether you're dealing with a multi-tree or a table implementation.)

I agree with the TuringEquivalency comment. However, combinations of trees, lists, and maps (NavigationalDatabase) can get pretty messy as changes accumulate. In relational there are fewer ways to do the same thing, and this results in consistency. Trees of maps and maps of trees results in a shanty-town like mess in my opinion. Everyone will do it differently and there are no guiding principles to navigate the mess other than "car driver beware". They are the Goto's of data structures. See OoLacksConsistencyDiscussion. Perhaps "navies" work for certain personalities, but night mine. I find them chaotic. -- top

I encountered a case for different institutional views of trees. Group A wanted to break out a given department (segment of company) by many detailed branches, while group B did not. The depth of some branches of A's and B's trees were thus different. It is not that the levels in general were limited, but the levels of that one deparment were different between trees. Group A's dealings happened to be more detailed with the department in question, and thus they wanted more detail in their hierarchy. For example, if you and I wanted to see an org chart, we would probably want more details on the IT branches than say Marketing because we are curious about the IT aspects of a company. To pull this off, either we need to store different trees for each group, or at least a partial tree(s) in order to list how deep to show the levels. (In an interactive system, we could probably use drill-down, but interactivity was not an option here.)

For a set-based locational/organizational approach, see examples in FearOfAddingTables.

March 2011 - I've just been burned again by using hierarchies instead of sets in an organizational summary structure tracking and reporting system having about 3000 leaves. It had been hierarchical for at least 75 years in this organization. I dug through dusty old age-tinted binders to check. (I think I found a drawing by Da Vinci :-) I thus thought it was a safe bet to continue with hierarchies for a recent overhaul to prepare for a web-based system. Well, just after completion, Murphy showed up and mucked it all to hell with funny overlaps, creating a big fudgey mess to make "ghost trees" to fake it because we don't have time and staff to convert it to sets now. (I still say hierarchies were the professionally "proper" decision based on analysis of past history at the decision time, but the DiscontinuitySpike for getting it "wrong" is huge regardless.) It's kind of like buying stock in the twin towers on Sept. 10th, 2001. The Great Recession appears to have been at least part of the cause for such a big change. The organization took on "rogue" sub-organizations in order to keep revenue coming in. During good times they probably wouldn't have wanted the complexity and ghost-ish nature of these rogues. --top


MulticellularOrganisms

Each cell nucleus in a multicellular organism is a product of mitosis. This forms a binary tree with the zygote's nucleus at the root. Every node (except gametes) contains the same DNA, but different sections are "executed" based on a node's position in the tree.

Is there any cross-contamination due to viruses, etc? See LimitsOfHierarchiesInBiology.

Why would that change the fact that the cells form a binary tree? Yes, viruses can move genes between cells, but we're not talking about genes. We're talking about cells. Cells split into two siblings, regardless of viruses.

But usually it is the genetic nature that one is interested in, not mere splitting partner.


Entities in an engineering plan.

Spacecraft = orbiter + fuel tank + SRB + SRB; orbiter = airframe + insulation + main engine + OMS + ..., OMS = tanks + pumps + chamber + ..., etc.

Even these are not "pure" trees. Sure they are. Multiple things often intersect multiple other things. Vague to the point of meaninglessness. Also, the accounting department may want other views of parts besides physical connections. True, but irrelevant.

Example: Where do the bolts that tie assembly A to assembly B belong on the hierarchy?

In Assembly C, which contains A, B, et al.

IOW, a contrived unit. If we want it to be more realistic, then it would be a graph, not a tree. For example, if you made a structure based on how the parts actually *touch*, in most cases it would not be a tree.

None-the-less, hierarchies are used for this purpose. Can you say "BillOfMaterials?"?

Hierarchies have been long known to be imperfect WRT BillOfMaterials?. If you try it, you will soon see that any tree based on the connections of most three-dimensional machine parts is artificial. A UsefulLie at best, but not a perfect description by any means. There will be multiple different ways to tree-ify many assemblies.


I would note that having a "strong hierarchical structure" does not mean that all access flavors (queries) will be tree-based. Whom do you think said it did?


Text Documents

Text data (collections of memos, web pages, letters, articles, etc) can be classified into multiple tree structures. Deriving the intersections is essentially a trees-to-sets transformation.

Yes, the indexes for the words that appear within the documents will be often rendered as trees, allowing the atomic operations needed to satisfy a set-based query: "find all {letters/articles/etc} documents containing references to (camping | hiking | packing) & (horses | mules | donkeys | burros)" is accomplished, atomically, by traversing trees and constructing a set view.

The indexed terms are much more granular than the documents being served, which can belong to hundreds - thousands - of trees.

"Can be" and "best as" are two different things. See "engineering plan" above for related comments. The bottom of http://www.geocities.com/tablizer/sets1.htm has a possible relational document classification schema.

Is this talking about text elements such as paragraphs, chaptures, etc.; or about classifying documents.

Possibly related: FileSystemAlternatives


GUI's

While much of GUI's can be represented by nesting components, there are issues to this also. For example, take a data grid (spreadsheet-like panel). One can represent a given cell with the hierarchy: table -> row -> cell. However, there are often times where it's more fitting to have: table -> column -> cell. No one nesting is always better than the other. The problem is that trees by themselves are an insufficient abstraction for 2D grids. Both views can be offered, but they are still only views on top of something non-hierarchical. -t


Much of this page is attacking heirarchies with the implicit belief that the data-centric approach is inherently better - the RelationalWeenies are creating a FalseDichotomy. The idea is that "trees suck, so use relational". The relational approach is, however, similarly flawed. Most relational systems barf all over themselves at tree traversal - and while many systems that look like trees turn out to be directed graphs, directed graphs are generally just fine for tree-like operations. One can still query a list of all the children of X from a directed graph - if you use pure-tree operations, you may get repeated results. Either way, a database query for this will be hideous, unless using Oracles not-very-relational CONNECT BY operation.

I thought the issue is whether to use trees as a model, not whether to model the trees in RDBMS (implementation of trees). HOW to impliment trees and IF to implement a tree are separate issues. There are other topics on implementing trees in SQL/relational, if one decides to go the tree route. And limits of a given query language or RDBMS brand is not the same as relational in general being flawed.

As far as things being natural DAG's, that is a different claim than natural trees. I am skeptical DAG's are the magical structure because they lack much of the simplicity of trees yet are not as flexible as full-on graphs (which sets can generally handle/emulate). But if you wish to make a DAG case (in a dedicated topic), I am all ears. Perhaps call it RealWorldDags? or DagsCoverLots??


Contributors: MartinZarate


See Also: LimitsOfHierarchies, LimitsOfHierarchiesInBiology, EmployeeTypes, ClassificationIsTough, RealWorldHierarchyExample


CategoryHierarchy


EditText of this page (last edited December 4, 2014) or FindPage with title or text search