Relational Tree Equality Example

Example Relational Join on Tree Equality using SMEQL

A "tree" in our tree-kit will be defined by convention as any table, virtual or actual, that has at least a numeric column called "parent_id" and a primary key called "key". (If the columns are not actually named that, then we can use the Calc operator to rename them. Also, some relational engines can automatically find or require a known key, such that the "key" specification may not be needed. However, we cannot rely on that feature across vendors.)

To identify any tree, we supply the table name and the root ID. This also implies that the same table can potentially "hold" multiple trees, and multiple over-lapping trees. Thus, we don't need a different table for each tree.

Let's introduce a domain function called "treeEqual" that compares the equality of two trees and returns 1 if a match or zero if not a match. It uses the convention we've established for trees here.

(A variation could be a query-level operation, not just a domain operation, but that is not necessary for our illustration. Also, "treeEqual" does not compare on the ID, only the values of matching column names. If we wanted to compare on ID, we could alias a copy of the ID using SMEQL's Calc first. Same with value column matching. SMEQL makes aliasing easy.)

Example "treeEqual" domain expression:

   treeEqual(tableA, 47, tableB, 333)

Here are the layouts for our test tables:

 bigTreeA  // table name
 --------
 key
 parent_ID 
 myStringA

bigTreeB -------- key parent_ID myStringA // same name as prior table to simplify example

treeListA // table with list "A" of trees ---------- parent_id descriptA

treeListB // table with list "B" of trees ---------- parent_id descriptB

SMEQL Join:

 r = join(treeListA, treeListB, (treeEqual(bigTreeA, 
          a.parent_id, bigtreeB, b.parent_id)))

Internal join processing table (abbreviated):

 a.pid|b.pid|descrA|descrB| (join clause)              | (result)
 ----------------------------------------------------------------
   4  |  3  | zog  | gleep| treeEq(bt_A, 4,  bt_B, 3)  | 1 
   4  | 19  | mig  | drog | treeEq(bt_A, 4,  bt_B, 19) | 0
  23  |  3  | foo  | bar  | treeEq(bt_A, 23, bt_B, 3)  | 0
  23  | 19  | foo  | blah | treeEq(bt_A, 23, bt_B, 19) | 1
  Etc...

(The "A" and "B" table naming happens to match Join's alias system of "a" and "b" left/right convention.)

One of the most generic (but slow) ways to implement joins is to create a CartesianProduct of the two+ tables, and let the condition (such as a WHERE clause) select which sub-set to use. (One could emulate a left join by appending a null row to right table and putting an "or null" in the condition...I think.)

Old style SQL joins, before the JOIN clause was introduced, had this kind of feel. Although, under the hood shortcuts are taken for simpler expressions to avoid having to do actual complete cartesion products of the tables involved.

Here, the relational engine lets the "domain math engine" execute the expression for each row. But before the relational engine hands off a given row to the domain math, the requested row values are subtituted in the expression, as shown in the "join clause" portion of our internal table. (It may not actually need to store each intermediate expression; for it's only shown in table form for illustration purposes here.)

Only those rows with a True result value (non-zero) would be returned to the result set. The result would be only those trees that match, something like:

 descripA descriptB
 -------- ---------
 zog      gleep
 foo      blah
 zork     blah
 etc...

(Actually, "parent_id" from the first table of the join would also be returned, per SMEQL rules; but it's not shown here to avoid confusion.)

--top

Where is treeEqual coming from? You've waved a magic wand and introduced it, but it's an important question.

Relevantly, if I have a tree where some of the elements are trees of a different type in another table (e.g. a tree that contains integers and trees of strings), how would I modify treeEqual to work?

That would largely depend on the "domain math engine". In tight-typed system, it would probably generate a "type mismatch" error. In others it may try to convert per type conversion rules, based on the types defined in the schemas. I don't see how this question is a DB-oriented question.

Ummm... I think you failed to grok the problem. This isn't a "type mismatch" error (if it were merely a type mismatch, I'd agree with you that the problem here is contrived). Instead, I meant a properly typed tree of integers containing trees of strings. In a functional language, the an example type might look similar to:

   define Tree A B = tree A (Tree A B) (Tree A B) | leaf B
   define TreeOfIntegersAndTreesOfStrings = Tree Integer (Tree String Unit)

In your approach, this would result in a couple different tables:

 intTreeTable 
 ------------
 node_id
 parent_id         //(refers to intTreeTable.node_id)
 nSequenceNumber   // 0 for left tree, 1 for right tree
 bLeaf             // true for leaf, false for tree node
 nMyInteger        // integer associated with tree node (unless this item is a leaf)
 strtree_node_id   // parent of tree of strings (ref. to strTreeTable.node_id)

strTreeTable ------------ node_id parent_id //(refers to strTreeTable.node_id) nSequenceNumber // 0 for left tree, 1 for right tree sMyString //(string associated with a node) // nothing needed to support leaves since they contain only 'Unit'

I hope the above better explains/clarifies the situation to you. My question was: how do I apply or modify 'treeEqual' to work for the intTreeTable, since I don't simply want to compare one 'strtree_node_id' to another - I need to know whether those are 'treeEqual' as well. I do not consider this an irrelevant question; at the moment, I see it as a significant flaw in your approach, made even more difficult since you're working with 'temporary' tables and such. It seems your approach only supports "shallow" trees, where the columns don't contain references to other DomainValue tables.

What does this have to do with comparing trees in the original example? Are you changing the example? First you wanted to see it do X, now you want it to do Y and complain that I didn't illustrate Y. And Y is still ill-defined. -t

The "original example" (tree of integers) is one instance of a "problem" (performing joins for equality between trees in relational). Solving the example but failing to solve the problem is a failure on your part, and is not an attempt at deceit or moving goal posts on mine. I also believe your case is hurt by your conjuring "treeEqual" out of whole cloth. I believe that providing you another example of the same problem will help you see why your original solution was simplistic and insufficient, and I suspect that you'll resist creating new domain functions every time you're presented with an example, which will help you focus on a real solution.

Are you asking how a given node can participate in being parts of different trees at the same time? -t

No, that really isn't what I'm asking. I asked how to perform treeEquals when one tree contains another tree (or graph, or matrix, or other complex value). By your question, are you implying a belief that strTreeTable should be folded into intTreeTable, perhaps to fit it into the mold supported by 'treeEqual'?

If you mean "how do we represent and compare a tree of mixed types"? In practice, we often don't have a choice of how the source data is represented. The user of the tables is quite often not the original builder. We must perform our operations on existing structures, or make copies/views of them and reshape them to our fitting. Thus, I would first ask: how is the original source data represented? One approach for comparing is converting the integers into strings, something like an AttributeTable with a Parent-ID. Another approach is to have a string column and a separate integer column in our virtual table node. One or the other may not be populated (null or empty) in a given node. -t

I'm not asking "how do we represent and compare trees of mixed types" - that's implied, since in the general case all trees are of mixed types (node vs. leaf, value held in node vs. value held in leaf). You can presume the original structure was a tree-form DomainValue that contained no explicit 'node id' or 'parent id', and that someone took your advice and destructured it into tables. Possibly, they could use one table for each 'type' of tree, so a tree containing trees might contain two tables. But it seems you're favoring TheMostComplexWhichCanBeMadeToWork approach of stuffing everything into one big table (FearOfAddingTables?) - an approach that may seem viable (if brutishly ugly) until I start putting references to Matrices and other complex types into the tree.

Where did I say anything about stuffing things into one-big-table? You mean for converting diverse sources?

As far as measuring complexity, we'd have to check it against real-world situations. If you think all the input sources will be from a pristine consistent type system, I'll suggest you are dreaming. People providing info don't live just to make YOUR life easier. The conversions will likely be ugly either way.

It doesn't matter whether sources are "from pristine consistent type systems". People will provide an AdapterPattern either way - which means that cannot be argued as a point in favor of your argument. And you're wrong about people providing information: people provide information based on the formats that make processing that information easier. That's why you have "forms" in businesses, and "fill in the bubble" tests.

Adapters can always be made with enough effort. And views that provide a single table viewpoint to multiple diverse structures *is* a form of adapter.

Correct. Which means we both need to use adapters, and therefore you can't argue one approach worse than the other on that account.

I'm arguing that RDBMS are a longer-tested (road-tested), more familiar, and better understood tool. I'm just proposing extending them, while you are basically starting your "type DB" from scratch. Maybe a mature type-DB would be powerful and flexible, but it's economically logical to attempt to extend what already exists rather than start from scratch (unless a big show-stopper is discovered). We could do both, but the relational-based approach is likely to be ready first because it's not starting at the starting line and people are already familiar with RDBMS.

Saying your approach is "relational-based" as though the other is not is a lie, or at least a gross misrepresentation of which you should be ashamed. Use of structured DomainValues in an RDBMS still results in an RDBMS. What does need changed is the DataManipulation language - SQL isn't designed for extension to structured DomainValues. The one advantage you might have is some ability to compile your SMEQL to SQL - one-time implementation costs are lower for you.

By "structured domain values" do you mean domain values that are structures? The "ed" versus "es" needs to be cleared up here. Structure "types" are generally "opaque" to relational, as you would say, and making them more relational would make them less typish.

I refer to DomainValues constructed of more primitive values, or equivalently whose type is constructed of more primitive types and may be recursive. Examples include records, tagged unions, and even relations (a relation as an attribute value), etc.. Most of them, with the unique exception of relation, are 'opaque' to relational operators, though use of a function that returns some aspect of a value as a relation would allow profitable use of relational operators on arbitrary values (especially useful as multiple return values, like the square root of a number). Not certain I'd say relations are less 'type-ish', but I suppose one might call it more or less so based on the degree to which constraints are enforced.


Re: "And I'll suggest that type-centric approaches tend to create controlled, pristine environments" - I suspect that if your ideas made their way into practice, you'd find that people view "types" different enough from you that the consistency you seek to make things play together nicely will not really be there after the first developer leaves. You are assuming your philosophies will be taken up all or nothing. Once buzzwords wear off, people tend to mix-and-match according to personal internal psychology.

I'm making no such assumptions. You're imagining something very different from what I said. I did not say "global" pristine environments, for example, and I fully believe that there will be AdapterPattern operations between environments. I do not require "all or nothing" adoption any more than any introduction of new standards requires it.

See above about adapters.


MarchZeroNine


EditText of this page (last edited July 8, 2010) or FindPage with title or text search