Database Is Representer Of Entities

As has been colorfully pointed out, in discussion of this page on other pages, clearly I misframed discussion of this topic, and so I intend to refactor this page. (Comments and such aiming at assisting with refactoring are of course welcome, no need to wait for completion.)

Meanwhile, the short version is: DatabaseIsRepresenterOfFacts is something well-grounded in theory, whereas DatabaseIsRepresenterOfEntities is not based in theory, it seems to be an "anything goes" pragmatic approach that includes DatabaseIsRepresenterOfFacts as a subset (since, after all, anything goes) mentioned (but not clearly explained) by JoeCelko (and very briefly on DatabaseIsRepresenterOfFacts under "Database is a Filing Cabinet" and "The database is where dead objects live" subtopics, humorously-intended or not, and in the recent page DatabaseIsRepresenterOfEverything, which I think is perhaps arguing for an ad hoc approach), and I'm unclear if he even exactly advocates it, as opposed to simply recognizing it as an approach often taken in the real world, flawed or not.

As CC has pointed out, this "anything goes" point of view destroys important properties of the relational model; the thing about the "anything goes" point of view is that they don't care, apparently

I personally do not morally condemn ad hoc approaches, even though I believe they are inherently less sound than theoretically grounded approaches, on any topic, in part because it is very clear to me that theory is inherently always a subset of pragma. I.e. what we know with precision and accuracy is always a smaller set than what we know in a rough, fuzzy, pragma sense, as classically exemplified by the arts versus the sciences.

It is also IMHO a matter of historical fact that theoretical precision always grows out of fuzzy pragma, and that the latter is in fact always the starting point for the former, and that therefore fuzzy pragma should never be considered shameful...especially before it grows into precision (chemistry from alchemy roots), but to some extent, even afterward (medicines from folk botany).

Others may (well, clearly do) feel differently. (I'm still wrestling with how to phrase such things; the reasons I have for being interested in any given topic are apparently often different than how people usually view the topic, which seems to lead to a rich source of misinterpretation/misunderstandings.)

I meant this page to be interpreted as above from the beginning, but more than one person made it clear that my intention failed.

I see that my interest in ad hoc practices can (and has been) sharply misunderstood, something I didn't understand quite as clearly before now.

Part of the following page is simply paraphrases/quotes; that part I don't see a need to change; quotes are (bear with me here) relatively objective ("Person A said X" is true even if X is false). What needs to change is the interstitials. -- DougMerritt


Two major points of view about databases are DatabaseIsRepresenterOfFacts (advocated by e.g. ChrisDate, originally by DrCodd, and locally by CostinCozianu) and DatabaseIsRepresenterOfEntities (more fully, entities, classes, and/or relationships between entities, advocated by e.g. JoeCelko and other SQL standards committee (ex-) members, and apparently also by DrCodd after 1989).

ChrisDate advocates the Fact approach because it is scientific, firmly grounded in theory, and disciplined, whereas the Entities approach is ad hoc ("based on intuition rather than principle") and typically suffers from lack of discipline.

On the other hand, ChrisDate describes (pg721) a "well-understood body of data: all relationships known, all vars known as to whether dependent or independent variables, where the independent variables form an addressing scheme and the dependent variables are values in cells." ...but goes on to say that "most bodies of data are not well understood", which sounds like the point of view of advocates of the "Entities" approach, which Date nonetheless deplores. (This is one of the few apparent inconsistencies with his argument, which is otherwise quite strong.)

JoeCelko: "The fundamental question is, what are you modeling in a table? Both DrCodd and ChrisDate take the position that a table is a collection of facts...once one states that a fact is true, it serves no useful purpose to state it again." From that point of view, databases should contain sets with no duplicate rows, not multisets/bags.

JoeCelko: "When the question came up in the SQL standards committee, we decided to leave duplicates in the standard. The example we used in committee discussions...was a cash register receipt with multiple occurrences of cans of cat food on it. This is how this came to be called 'the cat food problem' in the literature."

I'm not aware of anyone who says that DatabaseIsRepresenterOfFacts is always the wrong approach, whereas advocates of DatabaseIsRepresenterOfFacts often do say that it's the only correct approach and things like DatabaseIsRepresenterOfEntities are just plain wrong, and that SqlLanguage support for multisets rather than sets is just wrong.

Rather, supporters of DatabaseIsRepresenterOfEntities tend to view the topic pragmatically, and may say that although it can be great to have only facts in a database, that real world needs are sometimes more complicated than that.

DrCodd himself eventually decided that there was a real-world need for duplication, and added a "degree of duplication" operator to his model. ("RV-6 VIEW Updating." The Relational Model for Database Management: Version 2, E.F. Codd) I'm not entirely clear if this meant he shifted from supporting "Of Facts" to "Of Entities" or not, but it seems to be implied.

JoeCelko explains that with the entities, classes, and/or relationships between entities approach, 'a duplicate row means more than merely one occorrence of an entity. This leads to a more object-oriented view of data where I have to deal with different fundamental relationships among "duplicates", such as:'

The above is quoted or paraphrased from pg 18 of SqlForSmarties, 2nd ed, JoeCelko.


A related issue is the ClosedWorldAssumption, which with DatabaseIsRepresenterOfFacts means that not only is each relation in the DB a true predicate, but furthermore that all truths are present in the DB, so if a predicate is not listed as true, then it is definitely false.

For many things that's obviously attractive, and is part of the argument against NULLs; there isn't even a need for their direct base relvar representation if everything is either true or false but never e.g. unknown. (For derived relvars more discussion is apropos but OffTopic here.)

Conversely, if someone believes that the ClosedWorldAssumption is false for their database, and that an unlisted predicate doesn't imply anything, then....[to be continued]


Since there was a request for non-anonymity: This page was created by DougMerritt. A careful reader will note that nowhere do I offer an opinion on the subject, I merely quote what JoeCelko said, figuring that was a fair balance to the ChrisDate side of things.

I also quite carefully said that I'm not aware of anyone who thinks that DatabaseIsRepresenterOfFacts is wrong.

JoeCelko is not a theoretical authority, but he's an ex-member of the SQL standards committee and has written well-received books and columns, so I believe he's a perfectly good representative of the pragmatic side of things (which may well be wrong; I never said otherwise).

A careful observer would note that today I made quite, quite extensive quotes from ChrisDate on other pages; it's not that I think JoeCelko is the last word.

As for my personal views, I have a strong interest in various forms of SemanticMapping and KnowledgeRepresentation? and such for application in AI, and in AI the "just the facts, m'am" approach has been explored for even longer than it has for business databases (since the 1950s, easily), and in that domain it is known to have some apparent problems (as reflected by the glaring lack of strong AI today), although variants on it are still heavily pursued, e.g. by the Cyc/OpenCyc project. So I'm interested in exploring alternatives, not in finding out what some (any!) authority thinks is the last word on the subject. AI is still a primitive field; there is no final answer in any aspect of it. And AI is not utterly different than the question of relational databases and business applications; relational DBs have in fact been used in AI projects, and conversely, principles of AI have been applied in the other direction.

Even more deeply, I apologize to no one for being interested in alternate approaches without even explaining myself.

For a contrary view on this topic and page, see page CostinCozianu.

P.S. I wanted to create this page back in 2004 but I feared displeasing CC. It appears my fear was not unfounded.


It seems to me there is a rather straightforward dual in the possible uses of any Database. It is one that I mentioned briefly under DataManipulation. Either a database is reflecting an external world, or simulating an internal one. Either way, the database is full of facts. Yet in reflection, each fact is an accepted truth about a world (and changes to the database are for the purpose of remaining consistent with changes in the world), while for simulation, each fact in the database actually creates a truth about a world (and changing the database changes that world).

A dual (in Category Theory) exists when all the arrows are reversed. For reflection, the cause-of-fact arrows are from world into database; for simulation, they are from database into world. For reflection, the purpose-of-manipulations are for consistency with a changing world; for simulation, the they are to change the world. A few other sets of conceptual arrows also end up reversed.

Conceptually, this is similar to what is being said with DatabaseIsRepresenterOfFacts vs. DatabaseIsRepresenterOfEntities (though, strictly speaking, a Database only ever carries facts, even if the facts are of the sort "there exists an object with the following properties..."; see WhatIsData).

Agreed, and that's an interesting way to put things. -- DougMerritt


See DatabaseIsRepresenterOfFacts


CategoryDatabase


EditText of this page (last edited July 9, 2010) or FindPage with title or text search