There seem to be two "thought camps" with regard to how to extend relational query languages in order to better deal with graph processing so that one does not have to write as much procedural "pointer hopping" and looping app code when dealing with graphs, DAG's, and trees.
One approach is to add recursion to the query language, and a second is to add "traversal" operation(s) that are declarative in nature. The recursive approach may be more flexible, but turns the query writer into a functional programmer. Although FP proponents will be happy with this, some argue it takes away from the declarative nature of query languages.
Traveral Operation Proposal
Here is a rough draft of a generic "traversal" operation. (Roughly based on RelationalAndTrees.) Basically we are assuming a many-to-many table with a matching "from" key set and a "to" key set. I use the word "set" because the keys may be compound keys. Some many-to-many tables don't fit this structure, but most can be converted/projected to this structure via various relational operators (UNION, JOIN, Etc.) via nested or referenced queries.
Inputs
- Table - real or virtual (calculated)
- Starting Record (it's primary key)
- "From" and "To" key column(s) used.
- Max-Depth - Maximum traversal depth. Zero or omitted for no limit.
- Ordering (optional) - Used to determine which path to take if multiple branches per node (does this ordering violate relational?) The to-key is always used to settle ties even if ordering is given.
- Distinct-Flag - To indicate if we only want distinct matches. Minimum depth (shortest path) and Ordering (above) are used if depth info is returned (see below).
Outputs
- The Row(s) found during traversal
- The depth, that is distance from starting node. This is optional and can be a generated column or a reserved function DEPTH().
- Parent (optional row or function) - Indicates shortest-path "parent" node (parent's "from-key").
- Sequence (optional row or function) - Indicates the traversal sequence. Because of the key rule above, there is always an ordering.
Issues
- If the supplied keys are not really unique, it could create problems for those items dependant on ordering. Perhaps it should generate an error. Needs pondering.
Note that any non-traversal-related filtering (WHERE clause) would be done before this operation, although for convenience we may want to include it. But for simplicity it will not be considered here.
Hypothetical SQL Example:
Table: roomstouch
---------
buildingID
roomID // unique only within a given building
buildingLink
roomLink
otherStuff
SELECT *, depth() AS depth, parent() AS parent
TRAVERSE FROM roomstouch
FROMKEY buildingID, roomID
TOKEY buildingLink, roomLink
START AT 23, 7
MAXDEPTH 5
Other possible optional features:
- Time limits. Often it is practical to limit the time of processing. Example cases may be missile guidance (land analysis) and a time-based games like chess. This mirrors the general human pattern of producing better answers if given more time.
- In practice, time limits and timer-signals are useful for a number of things. However, I'm not certain I'd limit it to QueryTraversalVersusRecursion. In RT decision making on data of any sort, you'll often need to limit the time spent collecting data.
- Agreed. It is similar to the need for a "Top N records" feature, found in most dialects.
Perhaps a better title would be "DeclarativeTraversalVersusRecursion?".