Magic Everything Machine

"It's easier to turn features on than add them from scratch."

[insert reference]

Answering that with rigor would require a large volume of usage surveys, which I cannot afford. But in general it boils down to how many collection-oriented features/operations (CollectionOrientedVerbs or COV) one needs versus scalar or single-cell kind of features/operations. Let's call this CTSR - Collection-To-Singular Ratio.

If I am doing heavy processing on strings as collections of characters or phrases, I tend to load them into tables. I've built a fair number of special-purpose table-oriented parsers over the years. However, most of the time we don't need many COV on strings, so we can treat them as single things without too many problems. The CTSR for strings is usually low in practice. If it grows high or possibly will be high in the future, then it may be better to table-ize the string, or at least use a collection system more powerful than strings and string operations.

Now let's take trees. If we are only doing tree-oriented things and will likely stay only doing tree-oriented things, then a dedicated tree may be satisfactory. But in my experience in my domain, no non-trivial structure of lasting use will escape the need for usages not originally anticipated. This is a huge fact of life that always hangs over any developer. The ability to adapt to non-anticipated uses is one of the strongest features of existing RDBMS, even with their current limits.

Why limit yourself to just a tree if you don't have to? If you pre-limit yourself to just a tree and then later need non-tree features, it's often labor-intensive to convert the info and software to use something else. IS-A has a high cost if you pick the wrong IS-A. I'd rather toggle features on and off as I need them.

The answer is performance. Single-purpose structures or collection-systems can generally out-perform RDBMS. But, this is a "physical" reason, not a logic one. If horsepower was not an issue, why not take something that has every collection-oriented feature in the world so that we get any feature we want without abandoning our tool? If you want to turn a feature off, you either don't use it or change some config setting to disallow it. Turning features on or off is a lot easier than adding them from scratch or abandoning our structure for a new "kind". (An exception may be if the volume of features is so large as to overwhelm us. Feature catalogs can perhaps help here so that we can disable banks of features at a time.)

But this MagicEverythingMachine''' does not exist. We'd all use the MEM if it existed, agreed? The problem though is its lack of existence. However, the closest collection-oriented thing on the market is a RDBMS of some form. Thus, if you can get a Tree System AND a Relational System all-in-one, you should take it. The only excuses left not to would be:

Now back to strings. If I had a new need to capitalize the first letter of each word in a string in order to generate titles and my string functions don't have such an operation, it would be nice to write something like:

 update myString set letter=Ucase(letter) where position_in_word = 1

If we had our MagicEverythingMachine we could perhaps do that without much hassle.

--top


"But in my experience in my domain, no non-trivial structure of lasting use will escape the need for usages not originally anticipated."

I think this is a crucial sentence, and one that possibly highlights the differences between:

Since the domain you work in involves creating end-user oriented business applications -- which I presume are driven by CrudScreens and are used to generate reports -- I can easily see how the inevitably-dynamic nature of business would impinge on development decisions. Indeed, having worked in that domain myself, I know that anything in the business assumed to be fixed and immutable today will change tomorrow. In that context, the RelationalModel is ideal -- I know of no other implemented model that provides the same positive flexibility-to-effort ratio for implementing and maintaining business applications.

However, outside of that domain -- or even within that domain using a development approach that focuses on creating invariant computational abstractions, in order to more effectively handle end-user requirements than existing tools currently support -- various container requirements may be identified. E.g., we need to represent an abstract syntax tree, or represent a hierarchy of nested function calls, or model dependencies in order to avoid deadlock. These require structures that do not map neatly onto tables -- the first example is a tree, the second is a stack, and the third is a graph. These are also invariants. They will only change if there are such wholesale changes in the application requirements that altering the container representation is the least of our worries. For example, we don't need to represent an abstract syntax tree if we're no longer implementing an embedded scripting language. However, the changes needed will far exceed the relatively minimal effort to switch from one data structure to another, which in such contexts, does not occur anyway. In implementing an embedded scripting system, the likelihood of needing to change an abstract syntax tree into an abstract syntax graph is vanishingly small.

In other words, the MagicEverythingMachine already exists for CrudScreen-oriented end-user general business applications, and it's currently represented by a SqlLanguage DBMS combined with (typically) a 4GL language designed for the purpose. For domains outside of CrudScreen-oriented end-user general business applications, a MagicEverythingMachine may simply not be appropriate, though AdaptiveCollection and ExtendedSetTheory certainly imply a greater degree of integration and generalisation than are currently typical, and define unifying structures that are more general-purpose than tables.

By the way, I would hardly agree that the average RDBMS is the closest thing we have to a MagicEverythingMachine. In terms of implemented models, I'd be more inclined to vote for virtually any OO language and its container/collection classes, plus (and this is the important part) whatever interfaces and other abstractions are used to distinguish abstract collections from concrete container representations.

OOP out of the box has no collection-oriented abilities, and encapsulation makes adding them difficult. In OO you have to add features to just about every class that wants to use them. In relational you subtract features, not add them. OOP is designed around encapsulation and tight association between operators and entities. These get in the way of collection-oriented thinking.

Production OOP environments certainly provide collection-oriented abilities, in the form of collection and/or container classes, interfaces, and iterators. Whilst it is certainly true that a typical general-purpose OO language is more of a do-it-yourself construction set than the average SQL DBMS, you can write an operating system (or a DBMS) in a general-purpose OO language, but you can't write an operating system in any DBMS that I've seen. Thus, a general-purpose OO language is more of a MagicEverythingMachine than a DBMS. A DBMS is, at best, a MagicDataProcessingMachineForBusiness.

If you add sufficient collection-oriented features to OO, it will then just about *be* a database, a navigational database of sorts. Thus, with your extensions, we are not comparing OO to relational, but rather OODBMS to RDBMS.

I'm not talking about what can be added to OO; I'm talking about features that already exist in popular OO languages like Smalltalk, Java, C#, C++/STL, and so on. These are not considered databases, and good OO programmers don't focus on modelling domain objects, anyway.

And I would indeed looove to have a relational-oriented OS. It would make it easier to track down mystery processes and files. Whether the *entire* OS would be relational though is a sticky matter. We could make relational TuringComplete by adding recursion, but in my opinion it would cause more problems than it solves. The ability to add domain-specific relational operators is sufficient. Relational shouldn't entirely replace app languages, but can do a lot of the heavy lifting.

I don't know what you mean by "[i]n relational you subtract features, not add them", nor do I know what you mean by "encapsulation and tight association between operators and entities [...] get in the way of collection-oriented thinking."

In relational you generally provide operations that can operate on ANY table/tuple-set. In some cases there may need to be limits on this, such as requiring some kind of "parent key" for tree operations, but in general any table that fits the needed characteristics for a operation can participate in that operation. In OOP you have to give it to each class one at a time. Multiple inheritance gets messy when used en-mass, and is not a requirement of OOP.

Multiple inheritance is not required. The mechanisms to support containment vary from language to language, but generally involve -- at a minimum -- either identifying, or defining, the mechanism by which one class instance can be compared to another. This is necessary in order to determine duplicates, find values, etc., in a given container. This can, of course, be described via inheritance but there are other mechanisms as well. None necessarily require that you "give it to each class one at a time" unless each class requires a distinct comparison mechanism. Implementations of the RelationalModel that support user-defined types require precisely the same explicitly-defined comparison mechanism, for precisely the same reason.

Keep in mind, by the way, that classes and tables/relations are not equivalent or even rough analogues. This is a common blunder (common enough that DateAndDarwen refer to it as the FirstGreatBlunder) and a notable source of confusion and misunderstandings. Classes are, however, roughly equivalent to attribute type definitions; instances are roughly equivalent to attribute values; and container class instances are roughly equivalent to tables/RelVars.


I think I see what you're aiming for: You'd like a DynamicallyTyped general-purpose procedural language that provides self-optimising (in the AdaptiveCollection sense) persistence-capable tables as its sole container type. Indeed, you might like these tables to be the primary mechanism by which all abstractions are defined, including types, wrappers around system resources (e.g., tables of files, tables of email messages, tables of mountable volumes, tables of socket messages, tables of lines in a text file, tables of characters in a string, tables of bits in an integer, etc.), and anything else that can be represented as a table. Yes?

If so, that's fine, but please don't call it "relational", because it isn't.

Where specifically does it "violate" relational (any more than SQL)? And the dynamic thing is not true because I'd separate the "domain math" from the relational math for the most part. I wouldn't dictate typing (or lack of), at least on the "cell" level.

It doesn't violate the RelationalModel any more than SQL, but SQL isn't an implementation of the RelationalModel. SQL was inspired by the RelationalModel. Similarly, your ideas are clearly inspired by the RelationalModel, but as they are often not in keeping with it, it would be helpful (for the sake of accuracy) if you'd refer to it as TableOrientedProgramming or some such. Also, you'd probably draw much less ire from some of us, as your inaccuracies about the RelationalModel would no longer be inaccuracies as such, but merely characteristics of Top's TableOrientedProgramming model.


Top: Why limit yourself to just a tree if you don't have to? If you pre-limit yourself to just a tree and then later need non-tree features, it's often labor-intensive to convert the info and software to use something else. IS-A has a high cost if you pick the wrong IS-A. I'd rather toggle features on and off as I need them.

A choice of schema also has a high cost if you pick the wrong schema, it's often labor intensive to convert the data to use something else. Your line of argument depends upon two fallacies:

Perhaps these are valid for SQL. They might even be valid for your SMEQL. But that's just proof that SQL and SMEQL are far from the pinnacle of query languages.

Converting data structures to other domain types including relations is the task of user-defined functions. Generalized relational aggregation ((A->A->A)->A->(T->A)->{Rel T}->A, first argument commutative and associative) allows one to convert relations - it's effectively the FoldFunction of relational programming. (Similar to FoldFunction, one can implement relational 'map' function, filtering, etc. using 'relational union' and the 'empty set' as the first two arguments to the aggregator.)

There is no need for a MagicEverythingMachine to get both flexibility and performance. Much more important is having symmetry - avoiding the sort of arbitrary limitations that TopMind proposes. Arbitrary limitations only succeed in forcing complex workarounds (violating simplicity) that in turn are prevented from taking efficient transitions (violating performance). Representing tree-values with surrogate IDs in a dedicated relation violates both simplicity and performance.

The complications and inefficiencies surrounding TopMind's approach are detailed in CrossToolTypeAndObjectSharing wherein TopMind proves incapable of even meeting the simplest and most critical of challenges: comparing structural equality between two tree values.

Nothing prevents a tree-comparison operator(s) from being added to a relational toolkit.

In CrossToolTypeAndObjectSharing you suggested the same, but you have thus far proven incapable of defining such a tree-comparison operator for any relational toolkit. You keep saying that nothing prevents it from being added, but that seems like TopMind's usual HandWaving hullabaloo to me. In any case, suppose you did add dedicated comparison operators for ordered trees, unordered trees, graphs, digraphs, and everything else we can imagine. That still doesn't help you with the simple problem of performing a query to find all entries in a relation that match structural equality to a tree described for that particular query.

I'm not sure what you mean. But first, I suggest you make your tree exercise a separate topic, since you seem to believe that it exposes some important flaw and keep mentioning it. And please, let's not start with the name-calling again. I already know you think I'm poop's evil brother. OnceAndOnlyOnce with this opinion. -t

It is already described in CrossToolTypeAndObjectSharing, Mr. poop's evil brother.


See also AdaptiveCollection


MarchZeroNine


EditText of this page (last edited March 5, 2009) or FindPage with title or text search