First Great Blunder Refuted

This is an easy refutation of the alleged FirstGreatBlunder as combatted by TheThirdManifesto. Also a refutation of DomainsNotRecordsOrTablesAreObjects.

To recap briefly, ChrisDate considers that if you have an "object type" A, say implemented by classA, and this has components int fieldA1 and string fieldA2, the typical way OO developers handle this is to reflect it in a table TableA along the lines

 create table TableA1 (
  fieldA1 integer,
  fieldA2 varchar2(100)
 );
Now this would establish a mapping or correspondence between classes and tables in relational databases. According to DateAndDarwen this would be a blunder, even a great blunder at that, the FirstGreatBlunder.

The right solution they say would be to map ClassA to a "domain" (in D&D relational parliance), i.e. a type that can be used for columns in the database. Let's say that we name that as DomainA, therefore we'll have some kind of table:

 create table TableA2 (
   fieldA DomainA
 );
Or even not have a table TableA, at all, and but have columns in other tables that have their domain DomainA. The discussion goes the same for this other case, so we'll focus on the first case for the sake of simplicity.

Now D&D would have it that the first design which is currently practiced on large scale by OO developers and the current RelationalWeenie is the FirstGreatBlunder, while the second would be OK. The refutation is trivial on two grounds (actually there are 2 refutations).

I double checked again and there's no misquotation. Date prescribes that classes=domains is the right equation, therefore tableA1 above or a Users table with the same attributes as there are fields in a User class is an embodiment of FirstGreatBlunder. Furthermore in my current systems I have three subtypes of users: so I have for each subtype a table UsersTypeA, UsersTypeB, UsersTypeC. According to the theory of D&D that would be precisely the first great blunder, especially they are adamant about forbidding subtables and supertables - a useful feature that is available in PostgeSql? and can be emulated in other databases. -- CostinCozianu

One thing wrong with TableA1 is exactly what's wrong with the following C++ code.

class ClassA1 { public:

    int memberA1;
    char memberA2[100];
};

If nothing else, the internals of the class are exposed. But it's the three user tables that clearly shows why it's a blunder. Chances are, the three users tables do not have the same shape, and the list of all users is no longer available.


There are other refutations:

The above discussions with Maurice and Darwen are probably related to the fact that Maurice may see classes as "class variables", not class definitions. Darwen may see classes as the definition (type) specification of an object.

Until there is a rigorous definition of OOP, battles like this will continue to be fought - people need to clearly define what they mean by "class" because many people confuse classes with objects (instances). People can also ben confused about what "instances" are since those aren't clearly defined. Classes, objects, and types need to be clarified as to how they are distinguishable, if they even are.

If objects are the same as classes and objects are not instances of classes, then objects/classes is redundant and one of the words needs to be removed from all writing about OOP (which is hard to do since people all over the internet use the words "object" and "class" already). If an object is an instance of a class, that helps distinguish matters, but it still begs the question: why didn't they choose better terminology than this confusing rubbish. This is why some consider OOP to be snake oil since it isn't clear and precise (but make no mistake, OOP is useful from an engineering perspective, even though it isn't clearly defined). Just like in engineering where we reuse other people's designs, OOP is useful to reuse (inherit) existing designs.


The first ground to dismiss the theory of the FirstGreatBlunder is an empirical one. We do that everyday and we do not suffer the evils we are supposed to suffer from. Most of the business systems out there have a Users class in Java/C#/C++/Python/Perl and a Users table. There's nothing wrong with it. DateAndDarwen fail to actually prove something immediately wrong as a result of this great blunder. If it was such a great blunder the proof should have been obvious.

The second ground is a very formal and very immediate. If TableA1 is a great blunder than the solution cannot be better than the disease, in other words TableA2 is perfectly equivalent to TableA1 (in the imaginary perfect relational database that D&D would have). It has to do with TypeTheory: once you have constructors and destructors (or accessors), say new DomainA(fieldA1, fieldA2) to construct a new object from its components and accessors obj.fieldA1() and obj.fieldA2() to access the information inside a DomainA object (corresponding to operators THE_fieldA1(obj) THE_fieldA2(obj) - those are mandated by TheThirdManifesto), then:

The following view based on TableA1 is perfectly equivalent to TableA2:

 create view ViewA1 as select (new DomainA(fieldA1,fieldA2)) from TableA1
And the following view based on TableA2 is perfectly equivalent to TableA2:
 create view ViewA2 as select (fieldA.fieldA1(), fieldA.fieldA2()) from TableA2
Both views have a one to one mapping of all behaviour to their originating table in terms of updates, inserts, deletes, selects. That is, TableA1 and TableA2 are duals of each other with respect to constructor/destructor and perfectly equivalent in terms of behaviour. Whatever unacceptable behavior will be discovered about TableA1, it'll also characterize TableA2. So the FirstGreatBlunder is much ado about nothing. -- CostinCozianu

The first mistake here is that D&D do not disagree with the notion of having a User class in an application, as in the example given above. They disagree with the notion of equating relvars with classes; in an OO system, if it's convenient for the application program to package a particular type of tuple as an object, then by all means do so. They don't proscribe this.


The second mistake is here: "Both views have a one to one mapping of all behaviour to their originating table in terms of updates, inserts, deletes, selects." But they do not have such a mapping with respect to joins, nor with respect to projections. (If by "select" you intend to also include "projection", then your error is one of commission and not just omission. One cannot, even in D&D's system "project away" attributes of a class!)

By equating classes to relvars, you have a problem with interpreting the results of any relational operation that combines relations with different headings, or even different domains for its attributes, or that produces a relation whose heading differs from that of its input relations. These operations would implicitly define new classes. Since OO classes are typically more than just their structure, it's not usually adequate to simply infer a new structure. Constructors, destructors, and other operators of the input classes may or may not apply to the output classes, and nobody has, to my knowledge, tried to come up with a systematic description of how to infer these. In fact, in many host languages, this would be a terribly difficult problem to solve.

It is for exactly this reason that OODB systems or APIs impede access to the database; they generally make it very difficult to access the inherent flexibility of a relational database by requiring you to do a lot of additional work to "wrap" a query, or by requiring you to completely sidestep the OO interface for ad hoc queries.

-- DanMuller

I'm not sure I understand what you intended here. Are you saying that in OODB systems, it is typically mandatory that a user explicitly define a class to represent the output of every query? This sounds quite onerous. In fact, the in-house OODB system I inherited at work was quite like this. This greatly inhibits the creation of ad-hoc queries, and either slows down development tremendously or encourages programmers to avoid customized queries, thus slowing the application down. -- DanMuller


However in the spirit of LawOfDemeterRevisited, just like behind the (IMHO) flawed LawOfDemeter lies a grain of truth, such is also the case with the FirstGreatBlunder. What may lead to less than good relational designs is following the mapping of objects to tables on auto-pilot rather than on case by case basis.

While it is OK in 99.9999% of the cases to map the User class to a Users table, the same cannot be said about blindly following an algorithm that says for every class in the OO system that needs to be persisted create a corresponding table.


D&D claim that OO is poorly defined (OoLacksConsistencyDiscussion). If that is the case, then there is no single "right" mapping approach.


To DougMerritt: quoting Date is not enough. Thinking critically about what Date wrote is different. I just proved that it matters not at all whether you store object information as tuples or whether you store it inside attributes, and now you're criticizing the support for UDT in databases of 5 years ago, which has nothing to do with FirstGreatBlunder. If you actually read TheThirdManifesto you'll also discover a huge invalid critique of the super-table sub-table issue which they criticized as a proposal in some paper but it is actually an (arguably useful) feature of PostgreSql for a long time now. How good was support for UDT in Oracle 5 years ago has absolutely nothing to do with whether FirstGreatBlunder is actually a blunder. FirstGreatBlunder is not about critique of UDTs in commercial system, first great blunder posits that objects should only be stored in attributes, period. And that's just plain wrong. -- CostinCozianu

On the FirstGreatBlunder page, the blunder is very clearly defined to be the claim that "relvar = object class", while Date believes the non-blunder should be "domain = object class". I believe that that sentence is not only true, but irrefutable. Because of that, I then wished to point out that "domains" must exist before Date's advice can be followed. For them to exist means at minimum that users must be able to define new types with associated operations, and that those operations must be usable in queries. I believe this is also true and irrefutable. I furthermore believe (this belief is not irrefutable, but seems true to me) that many widely used RDBMSes today still do not allow users to do so. This sentence might potentially be refuted, but I believe it to be true, and that your statements about Oracle are the exception, not the rule, and thus that many RDBMS systems still do not support "domains" in the Date sense, and therefore one cannot follow Date's advice when using them...but that potentially one could follow this part of Date's advice when using Oracle, since you seem to claim that version 8 and above supports "domains", if I understood you correctly.

These true things that I am saying may well be different than what you view as the central issue of your argument, but so what? Am I not allowed to make a true point on the topic? -- DougMerritt

Can you explain in English what does this mean "relvar = object class", because Mathematics, it ain't.

I just refuted what you consider as irrefutable ...by constructing two tables, one of which is according to Date's prescription while one violating Date's proscription. And they are obviously equivalent, there's no deficiency that you can show on one that there's not also in the other. According to Jan Hidders from comp.database.theory, it's so simple it's boring. That's your refutation right there.

In an imaginary perfect database system, there can be a useful partial function from relvars to object types (and that, unlike "relvars=classes" is a mathematical statement) without any adverse software engineering effect on the system. The database system can support this mapping (make it easy for the database designer to realize this mapping) with appropriate operators and typing rules.

Yes, Oracle has some workable form of "domains", as Oracle is under no obligation to follow the letter of TheThirdManifesto. TheThirdManifesto in contrast has none as it is not a working formal system, not even a formal system at all.


I just reread this page, and I'm still left with the strong impression that Costin misunderstands the target of TheThirdManifesto. At the time it was written, there were efforts underway to create object-oriented database systems. These were not mappings between relational databases and classes; they were attempts to replace relational databases with something that was thoroughly object-oriented, only borrowing what they could from relational theory. (Or perhaps more generously, they were trying to merge the two paradigms.) In those systems, the attempt was made to replace relations with classes; that is the equivalence which is rejected by D&D. (I have a copy of a book describing a proposed standard for such systems, although I can't lay my hands on it right now; it's referenced at least a couple of times in the appendices of TTM, if someone has a copy of that book handy.)

Support for this interpretation of D&D's thesis can be seen in the SecondGreatBlunder, which is described as the introduction of object identity to databases. Indeed, the OODBMS designs of this time reintroduced pointers to databases. As I recall, they made no attempt to map between keys and object identity; keys became irrelevant.

Costin seems to misinterpret application-specific mappings between (apparently only some) relations and classes as the target of the DateAndDarwen's criticisms. I thoroughly agree that such mappings can be useful and pragmatic. But they are a far cry from a merging of the concepts of relations and classes, which would have to include ways of merging concepts such as polymorphism, object identity, and object containment with relational theory in order not to lose the benefits of the latter.

Here is some fairly direct support for my assertion regarding DateAndDarwen's intentions: http://www.dbdebunk.com/page/page/1706754.htm

-- DanMuller


An important point to be made here (again) is that the supposed refutation is based on the equivalence of objects and relvars as data structures, which is what D&D are really arguing against. The 'refutation' completely ignores an important property of both objects and attributes: opacity. Neither attributes nor objects are data structures. They are represented by data structures, internally, but those details should not be visible, unlike the details of a relation variable, which must be visible. It should be possible to substitute the representation of a given class or domain with any other representation of equivalent behavior (to the extent that this is possible, given the Law Of LeakyAbstraction) without altering the way in which any attribute or object of that type is used.

In other words, what people are really confusing are the concepts of data type and data structure. They are not at all the same thing, even though all data types in a given program (even so-called 'primitive' types) are represented by data structures. (Remember, all types are contextual to how the variable is used; underneath it all, the actual 'type' of all variables is 'signals' in the InformationTheory sense, and 'primitive' types are no more fundamental than any other. (That may be true if you limit yourself to digital types that possess a character basis... in the InformationTheory sense, 'primitive' types (that we know of) are: digital, analog, quantum, and void. But I digress. -- db) Types exist for our benefit, because we as humans need to chunk information in manners which we can use to form larger mental models; all the typing support in our programming languages are just tools to simplify that process, and if a given tool does not do so, one should use another tool. But this is a digression...)

Getting back on track: the actual representation of a data type as actual variables shouldn't be important. If I have a type 'Personal-Name', with a corresponding operation which gives the name as a zero-delimited array of ASCII characters, it shouldn't matter to the ClientProgrammer if the type is actually represented as an ASCII array, a list of separate ASCII arrays representing the individual parts of the name, a length-encoded UNICODE string, a linked list of EBCDIC characters, or a two-dimensional array of booleans representing the Morse Code for the name, so long as the operation gives the required ASCII array.

For that matter, it is important to remember that relations themselves are not data structures - the familiar tabular form is only a representation of the relation, which abstractly is a correlation between two or more interdependent but still separate data. It is a mistake to see relations as data structures, for the opposite reason that it is a mistake to see types as data structures: whereas a type represents a single, coherent concept, a relation represents a connection between two or more concepts but not those concepts themselves. The attributes of a relvar are not themselves part of the relvar, but are referred to by the relvar; they can and in many cases should be represented separately from the relvar. If this were not the case, normalization would be impossible; conceptually, a sub-key is a reference, not a datum in and of itself, even if it also carries semantically meaningful information.

By way of analogy, a Lisp atom which is referred to by a list is not a part of the list itself; the list only refers to the atom, but if the list is garbage collected, it does not necessarily mean that the atom is as well, since other lists may refer to it. That analogy has some weaknesses, admittedly, as it involves a certain amount of meta-level confusion, but that's been rife throughout this argument, on both sides.

In some ways, the term 'relational database' is unfortunate, since relations are not about the data themselves, but about relations between data (and other relations). The storage of a given datum is less important than the context in which that datum exists. -- JayOsako

I think you use the word 'data' confusingly towards the end, there. I believe relations are about 'data' in the 'WhatIsData' sense (DatabaseIsRepresenterOfFacts). A relation is essentially a predicate; a system like DataLog, MercuryLanguage, or PrologLanguage is fairly ideal for describing relations to their greatest extent, while SQL and such does the job for more primitive finite relations (where speed and efficient storage are of greater concern than flexibility). However, relations are not about data structures or data types.

DataStructures and DataTypes are not data; they are simply representations for a more abstract concept traditionally called 'values', which also are not data; e.g. you can represent a string or a number in a DataStructure, but the string or number says nothing at all about the world... not until you put it into some sort of relational context. Similarly, Objects are also not 'data' (in the WhatIsData sense). An object is not a fact. An object isn't even a collection of facts. Rather, by representing an object by use of a DataStructure and some sort of identifier (reference, pointer, primary key, etc.), you represent facts about the object and its behavior. And in an ObjectOriented system, this set of facts is projected into reality as an actual object by the runtime environment. But the object is not the facts about the object; it is only represented by them, which are represented in a DataStructure. Objects themselves truly are little pieces of reality, not reflections or models of it. See ObjectVsModel.

But one can model objects in a relational database in a manner that is conceptually consistent with the modeling approach (no blunders involved) by exposing a relation for every exposed attribute and behavior of that object; an exposed attribute for an object is essentially a 'fact' about the object that is meaningful. (And make no mistake, a 'class' is simply an identifier for a shared (and often invariant) set of behaviors and additional facts about an object. It should not be the case that each 'class' gets its own table.) E.g. if some objects have a 'name' attribute, you could create an NAME(ObjectID,Name-Value (string?)) that represents the fact that some objects have names. One can further denormalize lots of different binary-predicate tables (with one of the components being ObjectID) into one big-arse table for efficiency reasons, so that you don't need to touch ten tables to get ten facts that you often need all at once, and you get what looks very much like what D&D call "The FirstGreatBlunder". (Of course, a good, optimizing DBMS would denormalize for you when it sees it can do so and gain efficiency. There is no good reason to do it by hand.) In addition, one might deal with 'private' data via some sort of amalgamation of hidden Markov-style variables collected into a 'private_state' (which might itself be a relation or micro-database), which is mostly only read or modified by the behavior specs for the object. This would essentially correspond to D&D's proposal that objects be given 'domains'.

If you can get that close to the FirstGreatBlunder without actually blundering, one does wonder how much a blunder the FirstGreatBlunder really is. It seems the only thing missing from Costin's original spec is the 'ObjectID', and the fact that there was a table for just that class rather than a table for objects with an identifier for the class.

 class ClassA1 {
  public:
   int    attribute1;
   string attribute2;
   void   behavior1() { code_for_behavior1 }
 };
    ... normalized model (expecting many more classes) ...
 create table Objects (
   ObjectID   oid_value unique
 );
 create table Class {
   ObjectID   oid_value unique
   ClassName  string
 }; 
 create table Class_Hierarchy {
   ClassName  string
   SuperClass string
 };
 class table Class_Behaviors {
   ClassName  string
   BehaviorName string
   BehaviorDesc lambda returning procedure
 };
    (or could do 'With_Behavior1(ClassName,BehaviorDesc)')
    (Attributes as individual predicates)
 create table Has_Attribute1 {
   ObjectID   oid_value unique
   Attribute1 integer
 };
 create table Has_Attribute2 {
   ObjectID   oid_value unique
   Attribute2 string
 };
     (OR All attributes as one predicate)
 create table Attribute {
   ObjectID oid_value
   AttributeName string
   AttributeValue dynamic
   candidate key (ObjectID,AttributeName)
 };
  ... fully denormalized, possibly by the DBMS, upon review of the various constraints and actual data ...
  create table Objects {
   // ObjectID oid_value - can be eliminated because it's a surrogate ID and there are no other relations using it
   // ClassName - can be eliminated because all objects are of one class
   Attribute1  integer
   Attribute2  string
  };
  create table Behaviors {
   // ClassName - can be eliminated as implicit because all objects are of only one class
   Behavior1  lambda returning procedure  // could be moved; has and will only ever have exactly one entry
  };
  ... and look at that! denormalized all the way back to the FirstGreatBlunder ...
The ability to derive facts DataLog style would still go a long way to making the above convenient to use (e.g. one can say that an object has behavior X if it has it directly (first choice), or if is of a class that has behavior X; a class has behavior X if it has it directly (first choice) or if it has a superclass that has behavior X; etc. - rather than specifying it on a per-query basis). But I digress.

The blunder Costin exposes is a bit of a conceptual error inconsistent with DatabaseIsRepresenterOfFacts (mostly because ObjectID was missing, and is necessary for facts about an object system), but it seems to me that the real 'great blunder' is creating a table-per-class when representing objects. That ends up with a mess of tables, no clean way to find all the different places IDs might be, no clean way of associating objects with their behaviors, etc. When performing an object-relational mapping, one should probably consider the broadest possibility of a dynamic, prototype-based language and style the schema for that (even 'Class' is unnecessary for fully prototype-based languages - just have prototype be another object).

Of course, D&D think that object identity (ObjectID) is the SecondGreatBlunder. When it comes to modeling an object system, they're wrong - object identity is fundamental to such systems. One cannot describe relations between objects or behaviors of objects without some mechanism to uniquely identify the objects. For example, it is provable that some sort of extrinsic identifier is absolutely necessary to describe any simple graph of more than one point. There is a higher, meta-question as to whether one should ever bother modeling object systems or graphs within a relational database, but I don't believe that D&D are the right guys to answer that question... I'll wait for an answer from the ArtificialIntelligence, AutonomousAgent, ExpertSystem? and KnowledgeSystem? people.

[Regarding your view on the SecondGreatBlunder, that "[w]hen it comes to modeling an object system, [DateAndDarwen are] wrong - object identity is fundamental to such systems": DateAndDarwen are referring specifically to relational systems, not object-oriented systems. Although their proscriptions against the FirstGreatBlunder (relvars are not domains, i.e., relation-valued variables are not equivalent to classes) and against the SecondGreatBlunder (there shall be no object IDs) appear in a chapter entitled "OO Proscriptions", the "OO" refers to "Other Orthogonal", which is intended to remind the reader of specifically those features of object-oriented systems that they deem valuable in a relational system, i.e., user-defined types and type inheritance. Other notions of object-orientation - instance, object, and class, for example - are not included in their model. Instead, their relational model is based on the core concepts of type, value, and variable (plus operator, which is not relevant here). There is no notion of stateful or mutable objects or instances. Values, which are as close to OO instances as their model gets, are immutable. Only variables are mutable via assignment.]

[The SecondGreatBlunder and its associated proscription (no object IDs) is intended to address what they believe to be a flawed approach in implementations of the relational model: the association of opaque identifiers with values, such that two values can be considered unequal or distinct even though their semantic content, i.e., what they represent, is equal. If this is permitted, we can have (among other problems) the perverse situation where value "3" of an Integer type in variable X can be considered unequal to the value "3" of an Integer type in variable Y, merely because the two "instances" (ugh) have differing object IDs. The same concept extends to tuples representing real-world entity instances. Would it be meaningful - in data management terms - to have two otherwise identical "Alice" employee records that differ only by an opaque, internally-generated "object ID"?] [This does not, however, preclude using relational systems to house and manipulate information about external object-oriented systems, as one might do with an ObjectRelationalMapper. The Blunder lies in depending on internally-generated object IDs within the relational system. There is nothing in the model that prevents the user from defining table/relvar/relation attributes that contain object IDs generated externally. In this case, an "object ID" is merely another attribute of real-world entities (which may or may not be object-oriented system objects), and of no greater or lesser significance than any other attribute. Thus, the relational model can be used to represent graphs, etc., on behalf of an object-oriented (or other object-id dependent) system without causing the SecondGreatBlunder or violating its proscription.] [A similar argument may well apply to the FirstGreatBlunder. It appears intended to justify a proscription for implementing relational systems, such that a sound relational system should not equate variables with types. It does not appear to be an injunction against using relational systems to house external object images. Whether to map an external client-side Employee class (for example) to a server-side table/relvar, or map it to a server-side attribute type, is merely a conventional ER modeling problem that should be decided on the basis of requirements rather than a general rule.]

-- DaveVoorhis


It occurs to me that this is looking at the problem back to front. Date isn't talking about the design of databases, but of database engines. What he's arguing for, first and foremost, is for RBDMS designers to implement the full CREATE DOMAIN spec of the SQL standard. It's equivalent to the arguments put forward in the 1960s and 1970s in favor of including structured data types in programming languages. -- JayOsako

The complaint that most RDBMS lack the ability to create user-defined types is common and should be addressed. However, if columns can be tables and tables can be columns, then the complexity and trace-ability of such a system may be called into question. Standards are useful as much for what they don't allow as much as what they do allow, and if they allow everything, then they are not really standards, but merely a canvas for chaos and MentalMasturbation. -t

{The only MentalMasturbation going on here is your idle, ill-considered, and unsubstantiated speculation. If you don't like user-defined types, don't use them.}

Sir, I did not say that I didn't like user-defined types.

{Then clearly you did not express yourself clearly. I don't know what you're on about with "columns can be tables" and "tables can be columns".}

You are right, I should have linked to reference info. My bad. I'll be back.


JanuaryZeroEight


EditText of this page (last edited March 18, 2013) or FindPage with title or text search