Domains Not Records Or Tables Are Objects

A key viewpoint expressed by HughDarwen and ChrisDate in TheThirdManifesto (and repeated by the latter author in recent editions of AnIntroductionToDatabaseSystems):

When mapping "objects" into a RelationalDatabase (for some definition of "object"), objects should be mapped into what are known as domains in the RelationalModel. Treating a relation/table as a "class" and treating a record/row as an "object" (or as an isomorphism of an "object") is, in their view, a fundamental mistake which they call the FirstGreatBlunder.

But the FirstGreatBlunder page says:

The first great blunder, is the treatment of relations (in the RelationalDatabase sense) and classes (in the ObjectOriented sense) as equivalent.

This is not at all the same thing. What do DarwenAndDate? actually say? In particular, where do they say that tuples cannot have behaviour and act polymorphically? This does not appear to be any such proscription in the paper version of TheThirdManifesto (http://citeseer.ist.psu.edu/darwen95third.html).

On the other hand, see FirstGreatBlunderRefuted.


"... treating a record/row as an "object" (or as an isomorphism of an "object") is, in their view, a fundamental mistake which they call the FirstGreatBlunder."

Do they say why? Clearly, if you're going to use an object-oriented language at all, then both records/rows and relations/tables must be objects (unless they are built-in, which doesn't seem sensible; or not first-class, which is definitely not sensible).

[Only if you think every datum in a program must be an object,

I was postulating an object-oriented language.

and then a tuple is an object in only a trivial sense - it's data, and has no behavior other than trivial data-like trivial actions such as copy/construction/attribute-selection.

Nevertheless it is an object. And even if mutable object state is problematic (I don't think it is), there is no reason why it can't be an arbitrary ValueObject; a row doesn't have to be a [primitive] tuple.

And what's not sensible about having database concepts built-in? (See below for more give-and-take on this.)

Part of what makes database programming more difficult than it needs to be is the fact that the concepts are not built in to most languages.

They should be built-in to standard libraries, IMHO - not languages.

But you can't build relational language into a library. Imagine putting relational into the C language. Not possible! C language is closed and finished. Putting it into a C library how? All you can do with C is something like mysql_query() functions, or call_tutorial_dee("strings here") which deal with dumb strings, not relational functionailty built into C itself. I suppose some languages that support domain specific language extensions might be able to build it into the libraries. There are also products (crap?) on the market like MicrosoftLinq that seems to try to address the problem, but likely fails at actually implementing relational properly.

[Making "rows" something other than tuples leaves you with a model that is not the relational model, so you would be inventing a new data model, and presumably an algebra to go with it. The point behind TheThirdManifesto is that the relational model has proven to be a very good and consistent model for working with data, with a complete and well-understood set of primitive operations, but that implementations to date have been lacking in their fidelity to the model. The authors also try to reconcile to some extent the object oriented models popular for application programming with the relation model - without doing violence to the latter.]

Yes relational is great for working with data, but objects work with data too! The advantage of OOP and procedural programming is you can modify the object data, or in C the STRUCT, and then run code tightly integrated with that data. With a database, the data is further away from you, not like a struct or object. You may be thinking: aha! separation of concerns! Well, let me ask: why is it people still use INI/XML files instead of SQLite or oracle databases? Because the database is too far away to be useful. It's still not built into your tool. SQLite comes close as being a replacement for INI/XML files, but is SQLite a replacement for associative arrays, hashlists, and lists? Not really, we still use lists and associative arrays, and even regular arrays. The databases are too hard to integrate with our tools, they are too separated and far away. What could be done, is merging the database right into the code, so that people stop reinventing the database with lists, associative arrays, hashlists, INI files, XML files. It would be nice to just use relations and create an INI file automatically, without actually worrying about the code that stores the physical INI file.

[Along the same lines, since the relational model is well-defined and stable, there is no particular reason to prefer libraries over built-in primitives, although either approach can work. In many languages, though, it's hard or impossible to write libraries that provide the same convenience of use that built-in concepts do, hence the preference for concepts being built-in; for other languages, the choice is a non-issue. By all means, let tuples be represented by objects, so long as those objects have the characteristics required by the relational model of tuples.]

I don't see there as being any contradiction between "a complete implementation of the relational model", and letting tuples have polymorphic behaviour.


What DarwenAndDate? mean in the above is that arbitrary objects (things like EmployeeId? or Currency, etc.) ought to be domains; and that there ought not be a canned isomorphism between any arbitrary object and some relation or tuple. Some ObjectDatabase products create such an isomorphism, and declare that if I have a class that looks like this in the code:

 class PersonsName? {
     string prefix;    // Mr, Mrs, etc
     string surname; 
     string givenName;
     string middleName;
     string suffix; // Jr., III, Esquire, etc.
 };

there must be a table somewhere whose columns correspond to the data fields in the above class. This is what Darwen and Date are objecting to.

Consider the following approach: model tables as indexable multisets of objects. In a standard library, provide implementations of these multisets (ordered for efficiency), with associated relational algebra operations. Wrap the objects that are put into these multisets in such a way that changes can be detected and any table invariants maintained.

There is no need to model different kinds of collection with inheritance, since multisets are general enough. The value of the object-oriented features of the language are in providing multiset implementations with different characteristics (persistence, transaction support, interfacing with other systems, etc.), and in modelling behaviour associated with the rows/objects.

(Alternatively, use sets and relations in place of multisets and multirelations. This choice is independent of the ObjectVsRelational? issues.)

What, if anything, is wrong with this approach? I really want to know, because I intend to use it.

In a pure-OO language, it might make sense to model relations and tuples as objects - but of type Relation and type Tuple. A Relation object would act like any other collection class in the language (Array, List, Dict) and have methods for iteration, access, and (possibly) mutation (insert, delete, etc.). In addition, it might have methods to support the RelationalAlgebra - join, project, select, etc. Likewise, tuple/record might be a class (database tuples are little different than the standard associative container found in most languages - whether its called "map" (C++), "hash" (Perl), "dict", or something else) with appropriate operations.

But having class PersonsName? inherit from class Relation is, to DarwenAndDate?, a no-no.

Why would it? PersonsName? is a row object; Relation would be a special case of a table - a table of (domain, codomain) pairs with no duplicates, to be precise.

[That's fine, but in a sense it's an implementation detail. To conform to the essentials of TheThirdManifesto, you would consider only the parts of your object model that correspond to relational concepts - relation, tuple, attribute, etc. Consider whether the operations you provide for these classes fulfill the requirements of the TheThirdManifesto - that's enough to keep anyone busy for a year or two. I'm not sure what you'd do with your "Table" base class and other descendants thereof - perhaps invent new operations on them that you find useful - but that would have little or nothing to do with the portion of your system that is concerned with the relational model. I suspect that they would likely add little of use that you couldn't get out of a complete implementation of the relational model - but you'd have to convince yourself of that by studying the RM in more depth.]


Can anyone give an example of arbitrary objects being domains, preferably in SQL?

Not in SQL; the fact that SQL has a crippled type system is one of the many SqlFlaws. TutorialDee and other "advanced" database languages, OTOH, do support what Darwen and Date propose.

Ah, that wasn't clear. So before we can apply DarwenAndDate?'s advice, we have to convince the powers that be to stop using SQL. Piece of cake. :-)

That's why they called their book the TheThirdManifesto, and not TheThirdHumbleSuggestion? or TheThirdPoliteRecommendation?. :) What they propose is a somewhat radical transformation to key parts of the IT infrastructure; a proposal bordering on LetsBlowUpTheUniverse. (Which isn't to say that they don't have some good ideas.)

I agree that they have good ideas. I just wish they had good ideas I could use today.

[You can, with some effort or expense. See for instance http://www.alphora.com/.]

I can't use that today. It only works on .NET. This reminds me of Catalysis. I wish the authors would spend their energy writing software I can use instead of manifestos.

[Yes, that criticism seems to be frequently made in one form or another. But I see the Manifesto as a step in that direction, just like many theoretical research papers or doctoral theses. People are using it as a basis for new implementations, and maybe some of the ideas will need to be revisited by them or by others as experience accrues. These things take time.]

There is also DuroProject? and RelProject


Likewise, use of TableInheritance and the like to implement polymorphism is an extension of this particular "blunder".

This viewpoint has quite a few interesting consequences:

There are other ways to prevent database invariants from being violated by object mutators; for example, use a HandleBodyPattern to ensure that the database always "knows" when an object has been mutated.

[The relational model defines database constraints declaratively; they're much simpler to define this way.]

Yes, but preventing invariants from being violated is an implementation-level problem. Using a HandleBodyPattern to detect mutation solves that problem, while being transparent to both the object and users of the table. The effect of mutating an object is then just the same as if the row was updated in an RDBMS.

[That's fine, but this isn't a discussion about implementation, but rather about the facilities presented to a programmer that uses a relational database system - about the logical model that the programmer uses. Issues of implementation are an entirely different ball of wax.]

The implication was that it was necessary to use immutable ValueObjects to prevent database invariants from being violated by object mutators invoked "behind the database's back". This is purely an implementation issue. In the logical model, invoking a mutator method is just like a record update. The database maintains its invariants somehow; the programmer doesn't care how. I mentioned one way to implement this simply as an existence proof that it is possible.


"[Making "rows" something other than tuples leaves you with a model that is not the relational model, so you would be inventing a new data model, and presumably an algebra to go with it.]"

No new model or algebra is required. You can treat an object as a tuple by using no-argument methods in place of field selectors; unlike a primitive tuple, however, the fields can be computed rather than stored, and update methods can be used instead of updating the tuple fields directly.

[In many languages, this would be impractical or at least very cumbersome, unless the concepts were built-in. Relational expressions (queries) generate new relations (thus giving the algebra the important characteristic of closure, which is missing in SQL), with new tuple "types" -- i.e. tuples that are made up of new, possibly unique groupings of attributes. Query expressions thus act as type generators. In some languages, this can indeed be done; in others, it can be done only with great difficulty (e.g. templates in C++). In statically typed languages, you have the added complication that the implied type of a particular attribute in such a derived relation need not be exactly the same as its declared type in the input relations.]

Yes, this does map more easily to dynamic OO languages than to languages where all types need to be known at compile-time. OTOH it has been done in C++; there was an article recently in C/C++ User's Journal about a system fairly similar to this.

[Also, it must be possible to reference the attributes of a tuple in the context of a relational expression, where you are not addressing a particular tuple's value but rather are using the reference as a placeholder to represent the values of all tuples of a referenced relation. (Consider something as simple as SQL's WHERE clause.) Treating tuple types as unique classes with attribute setters and getters would make this very difficult, or would at least require a very different syntax for attribute references in relational expressions.]

That's not difficult in any language with FirstClass closures.

[Applying the closures would be difficult in the general case. In a networked DBMS, they'd be supplied by the client, but would have to run on the server in order to work at all efficiently. Not impossible, but it seems to add a lot of complexity -- for what benefit? The relational model already includes an extension operation, which serves as the basis for calculating attributes.]

[Really, to discuss this further, it would be helpful if you identified the specific host language that you have in mind.]

Any reasonably dynamic language with some support for MetaProgramming would do; for example SmalltalkLanguage, RubyLanguage, PythonLanguage, CommonLisp, etc. (DynamicTyping isn't required, although it would make some things easier.)

[And what's not sensible about having database concepts built-in?]

If they are implemented in libraries, they can be more easily changed, or alternative implementations provided in application code. Things that are part of the language semantics are more difficult to change or extend. The raison d'�tre of object-oriented systems can be thought of as allowing functionality to be moved to libraries rather than having to be built-in to the language; that's certainly what they are most successful at. (Then again, maybe I just have a bias toward SmallLanguage?s.)

[Whether a library can provide convenient access to a full relational model really depends quite a bit on the host language. E.g. in a language like LISP, a library is a no-brainer; in C++, difficult but perhaps not impossible. I've been working on the latter, and have had to make many compromises, mainly in forfeiting static type checking. The comments above should give you a glimpse of the issues, but I can provide more detail if you like.]

I'll try to track down that C++-based system (I was borrowing someone else's copy of CUJ). It used templates to compute the result types of relational expressions.

Found it: http://lists.boost.org/MailArchives/boost/msg59731.php

[Thanks, sounds interesting. I considered this type of API in C++ myself, but my application programmers prefer a dynamically typed API, so it wasn't worth the bother of wrestling with the templates. See the somewhat dated NotesOnaCeePlusPlusRdbmsApi.]


EditText of this page (last edited November 26, 2014) or FindPage with title or text search