This is an easy refutation of the alleged FirstGreatBlunder as combatted by TheThirdManifesto. Also a refutation of DomainsNotRecordsOrTablesAreObjects.
To recap briefly, ChrisDate considers that if you have an "object type" A, say implemented by classA, and this has components int fieldA1 and string fieldA2, the typical way OO developers handle this is to reflect it in a table TableA along the lines
create table TableA1 (
fieldA1 integer,
fieldA2 varchar2(100)
);
Now this would establish a mapping or correspondence between classes and tables in relational databases. According to
DateAndDarwen this would be a blunder, even a great blunder at that, the
FirstGreatBlunder.
The right solution they say would be to map ClassA to a "domain" (in D&D relational parliance), i.e. a type that can be used for columns in the database. Let's say that we name that as DomainA, therefore we'll have some kind of table:
create table TableA2 (
fieldA DomainA
);
Or even not have a table TableA, at all, and but have columns in other tables that have their domain DomainA. The discussion goes the same for this other case, so we'll focus on the first case for the sake of simplicity.
Now D&D would have it that the first design which is currently practiced on large scale by OO developers and the current RelationalWeenie is the FirstGreatBlunder, while the second would be OK. The refutation is trivial on two grounds (actually there are 2 refutations).
- The refutation is not trivial, because the discussion above and below misquotes and misinterprets Date, in my opinion. I have little interest in a long argument based on misquotation, but see an exact ChrisDate quote that I added to FirstGreatBlunder that should clarify his position (in short, that values and types should be differentiated rather than confused, which is obviously reasonable to a first order approximation). Date himself gives more specific examples in the pages following what I quoted on FirstGreatBlunder, and those lengthy examples/discussions are not at all obviously wrong.
- And what would the misquotation be? That relvars are not domains is trivial, but not very helpful. Nobody ever claimed otherwise so that would be a StrawMan. But on the other hand it is well known that OO people follow a practice that maps every significant object class that needs persisted unto a Table holding information in those objects. Like from User class to Users tables.
- The problem is that OOP terminology is not well defined. Class variables are like relvars, but class definitions (types or class interface specification) are not like relvars since a typedef is not a variable. The distinction needs to be made between a class variable, and a class definition (specification). Is an object the instance of the class, or in other words is an object a variable, and a class is the variable's definition/specificiation? Or is an object the same as class? We need to precisely define OOP terminology right now, otherwise we are screwed, and will continue to be screwed (FUBAR). This needs to be clarified rigorously. Possibly a scientific website needs to be opened clearly defining OOP, or a paper needs to be written similar to "structured programming" essays that were written long ago.
I double checked again and there's no misquotation. Date prescribes that classes=domains is the right equation, therefore tableA1 above or a Users table with the same attributes as there are fields in a User class is an embodiment of
FirstGreatBlunder. Furthermore in my current systems I have three subtypes of users: so I have for each subtype a table UsersTypeA, UsersTypeB, UsersTypeC. According to the theory of D&D that would be precisely the first great blunder, especially they are adamant about forbidding subtables and supertables - a useful feature that is available in PostgeSql
? and can be emulated in other databases. --
CostinCozianu
One thing wrong with TableA1 is exactly what's wrong with the following C++ code.
class ClassA1
{
public:
int memberA1;
char memberA2[100];
};
If nothing else, the internals of the class are exposed. But it's the three user tables that clearly shows why it's a blunder. Chances are, the three users tables do not have the same shape, and the list of all users is no longer available.
There are other refutations:
The above discussions with Maurice and Darwen are probably related to the fact that Maurice may see classes as "class variables", not class definitions. Darwen may see classes as the definition (type) specification of an object.
Until there is a rigorous definition of OOP, battles like this will continue to be fought - people need to clearly define what they mean by "class" because many people confuse classes with objects (instances). People can also ben confused about what "instances" are since those aren't clearly defined. Classes, objects, and types need to be clarified as to how they are distinguishable, if they even are.
If objects are the same as classes and objects are not instances of classes, then objects/classes is redundant and one of the words needs to be removed from all writing about OOP (which is hard to do since people all over the internet use the words "object" and "class" already). If an object is an instance of a class, that helps distinguish matters, but it still begs the question: why didn't they choose better terminology than this confusing rubbish. This is why some consider OOP to be snake oil since it isn't clear and precise (but make no mistake, OOP is useful from an engineering perspective, even though it isn't clearly defined). Just like in engineering where we reuse other people's designs, OOP is useful to reuse (inherit) existing designs.
The first ground to dismiss the theory of the FirstGreatBlunder is an empirical one. We do that everyday and we do not suffer the evils we are supposed to suffer from. Most of the business systems out there have a Users class in Java/C#/C++/Python/Perl and a Users table. There's nothing wrong with it. DateAndDarwen fail to actually prove something immediately wrong as a result of this great blunder. If it was such a great blunder the proof should have been obvious.
The second ground is a very formal and very immediate. If TableA1 is a great blunder than the solution cannot be better than the disease, in other words TableA2 is perfectly equivalent to TableA1 (in the imaginary perfect relational database that D&D would have). It has to do with TypeTheory: once you have constructors and destructors (or accessors), say new DomainA(fieldA1, fieldA2) to construct a new object from its components and accessors obj.fieldA1() and obj.fieldA2() to access the information inside a DomainA object (corresponding to operators THE_fieldA1(obj) THE_fieldA2(obj) - those are mandated by TheThirdManifesto), then:
The following view based on TableA1 is perfectly equivalent to TableA2:
create view ViewA1 as select (new DomainA(fieldA1,fieldA2)) from TableA1
And the following view based on TableA2 is perfectly equivalent to TableA2:
create view ViewA2 as select (fieldA.fieldA1(), fieldA.fieldA2()) from TableA2
Both views have a one to one mapping of all behaviour to their originating table in terms of updates, inserts, deletes, selects. That is, TableA1 and TableA2 are duals of each other with respect to constructor/destructor and perfectly equivalent in terms of behaviour. Whatever unacceptable behavior will be discovered about TableA1, it'll also characterize TableA2. So the
FirstGreatBlunder is much ado about nothing. --
CostinCozianu
- Small, put important, point. The The_ operators are not mandated. TTM prescriptions RM1 and RM5 dictate that values must be exposed, but the form of these operators is not specified. The_ operators are part of Tutorial D. -- GedByrne?
The first mistake here is that D&D do not disagree with the notion of having a User class in an application, as in the example given above. They disagree with the notion of
equating relvars with classes; in an OO system, if it's convenient for the application program to package a particular type of tuple as an object, then by all means do so. They don't proscribe this.
- And what does "equating" mean? They are very fuzzy about it, but it is pretty clear from their writing that if you have a User object type, following their prescription, those objects should only live in columns and not in tuples. -- CostinCozianu
- "Equating" as in limiting access to the database to this OO-centric model. It's a possible model, but an inadequate one. See below. It's pretty clear from their writing that if an object is to be stored in the database, it should be represented as an attribute value and not as a tuple. To infer that they disallow uses for classes (or any other construct) outside the database is incorrect. That was my point - if a User class is useful at the application level, by all means use one. You'll have to map it into and out of the database system somehow, and this is exactly what many OODB libraries do. I admit that I'm inferring this, but I pretty confident that their objections address attempts to fit the entire relational database interface into the OO models, denying access to the database through anything but the relation-as-class view. -- DanMuller
- So if a User object is to be stored in the database, following D&D it should be stored as an attribute, not as a tuple. You just contradicted yourself because you said that D&D do not proscribe the Users table corresponding to the User class. The user class can be useful at the application level and can be useful at the database level as well. In any case, if I wanted to also declare User as a type to the database (Oracle's CREATE OBJECT TYPE), I'd like users to be stored as tuples inside the Users table that are implicitly of type User, and I just had a positive proof that there's nothing wrong with that - i.e. that there it makes no difference in the idealized database of D&D whether you store it in an attribute or in a tuple, and I also have an empirical proof that nothing wrong happens as such designs are prevalent in business information systems and suffer no obvious flaws that would justify calling that the FirstGreatBlunder.
- I guess I'm not making myself clear. In the database, it makes a difference, as noted below with respect to ad hoc queries. What you do with the data after you've retrieved it from the database is not within the scope of D&D's discussion - you can collect it into a class, put it in a map of field name strings to values, paint the values different colors and string them from wires to make a mobile. See previous responses below for more relevant discussions as to why it makes a difference. OODBMS' do have obvious flaws with respect to the ease or difficulty of defining queries - at least they have been obvious to me, which is why I latched onto D&D's approach in the first place. -- DanMuller
- Then if you think that in the database it makes a difference, give me one query that is available in the second case and cannot be formulated trivially for the first case, and you'll close your case. Should be easy, right? -- Costin
- Why don't you answer the questions posed below? Should be easy, right? -- Dan
- Because I take the questions to be rhetorical without much ado about what we discuss. A query example would not be rhetorical. If by questions you mean something like "In a relations-are-classes system, what operators apply to a class inferred from a projection or join?", first of all I don't think much of the question, second, I don't think how it relates to FirstGreatBlunderRefuted. In order to maintain that there's a great blunder, asking questions is not the way, but showing with an example: in the above tableA1 is bad design because such and such while tableA2 is good design because ... Other than that you know what I think about your emotional attachment to D&D :) -- Costin
- It's a great pity that you choose to answer so tritely. I was actually hoping that you might explain how ad hoc queries involving projection and joins are treated in OODBMS. At the time that I studied them, they seemed to be entirely ignored. Do such systems implicitly define new classes in such cases? If that is the case, I grant you your structural equivalence (as I already did below, which is why your request for an example puzzles me), but then ask you if these implicitly defined classes are truly classes in the systems that you have in mind, or merely glorified structs. Or must they be defined explicitly in advance? In which case I'd argue that, although workable, this is tedious and not helpful to database programming. Or are ad hoc queries disallowed? The questions are not rhetorical. There are a plethora of OO languages and OODBMS libraries. You seem familiar with them. Share your knowledge, rather than your ascerbism, if you would be so kind. -- DanMuller
- Actually there's no plethora of OODBMSes, which is more the pity that FirstGreatBlunder fights an imaginary adversary. But FirstGreatBlunder applies also to O/R mapping, that's why there's the confusion DomainsNotRecordsOrTablesAreObjects which is completely false advise. A User object properly belongs to a User tuple in the Users table. Actually, in the dream tutorial D database it would make no difference as I prove it. To answer your question, with every relation value or variable there's an associated tuple type and an associated relation type. If for some of these users choose to name them and associate operators with them (like User type -> Users table) there's nothing wrong with it. My knowledge that I want to share (if you do not mind) is that most of the argumentation in TheThirdManifesto is full of holes, plus there's no real reference to TypeTheory, nor any lessons learned. But the FirstGreatBlunder is easy to refute, and I undertook to do that given not only the implications in the imaginary tuutorial D or other kind of OODBMS, but real practical implications about how people design their schemas in current SQL DBMSes. They simply should not pay much attention to the alleged FirstGreatBlunder.
- Two comments:
- It should go almost without saying that people concerned only with writing applications that directly access current SQL DBMSes should not concern themselves with D&D's model, since current SQL DBMSes offer almost no support for user-defined field types.
- Except for Oracle who offers almost full support for UDT. So is Microsoft next version of SQL Server, DB2 support for UDT is very good, and I have been disconnected from the progress of Sybase for a while. PostgreSql offers full support for both UDT and super-tables/sub-tables.
- This makes no sense to me at all. TheThirdManifesto and the FirstGreatBlunder recommend that RDBMSes support domains (abstract data types including user data types). You say that, many years after publication of TheThirdManifesto, a few vendors belatedly started following that part of their advice, so therefore their advice was wrong? At the time it was written, I believe it is true that no major vendor had good (or any) user defined abstract data types at all, and the strongest statement you make is that Oracle recently offers "almost full support". So we're *still* not there. -- DougMerritt
- If you actually read a bit on the FirstGreatBlunder and FirstGreatBlunderRefuted, D&D recommend much more than just supporting UDT, and that's what I refuted. But again, I'm not going to repeat everything if you don't care to read. Oracle has recently have support meaning since oracle 8, while the current version is oracle 10g. It is not perfect support, but again D&D proposal may turn out ot be just not workable, because they haven't done their job of coming up with a sound formal system for their proposals.
- "If you actually read a bit on the FirstGreatBlunder" -- Huh? 85% of that page is text that I myself just added a day or two ago (quoting Date). How much sense does it make to accuse me of not reading what I myself wrote?
- Anyway, I also wrote all the stuff you wrote, too, several times over, and there are just too many issues up in the air all at the same time, so here, I singled out the single issue of support for "domains"/UDTs, and commented on that all by itself, which seems fair to me, whereas I cannot pin down 20 issues simultaneously. An important part of the topic is UDTs. You claim that D&D set up a Strawman nonexistent in the real world. The UDT part of it is not a strawman. Even Oracle 8 post-dates TheThirdManifesto. That's the single issue I was trying to disentangle enough to make a clear point about.
- Let me point out that the primary features of spatial/geo and temporal databases are to support WHERE features that CANNOT be otherwise implemented by a user query in an RDBMS that lacks reasonably complete user defined abstract data type support. These "specialized" database features are pretty important, but wouldn't always need vendor support if the D&D recommendation of full user defined domains were in fact available. This seems to me to be important and irrefutable, regardless of other D&D issues. -- DougMerritt
- And what would that complete support for UDT look like? It's nowhere to be seen in TheThirdManifesto.
- I didn't intend to make a big issue about what constitutes "complete", it's just that you said "almost full support", indicating that you yourself didn't think it was complete, that's all. -- dm
- I think that the notion of "class" that you use for judging D&D's work is a minimal one, different from the one that D&D criticize, in that it is only an aggregation of properties, similar to a tuple. Within the limitations of such a "struct-like" definition of a class, I find no fault with your criticisms of the model. But in many popular OO languages, encapsulation and data hiding are key concepts, implying that the structural composition of objects is hidden from users of the class.
- users of the class, or users of the object? which is it? This is not meant to offend, it is meant to precisely clarify terminology.
The second mistake is here: "Both views have a one to one mapping of all behaviour to their originating table in terms of updates, inserts, deletes, selects." But they do not have such a mapping with respect to joins, nor with respect to projections. (If by "select" you intend to also include "projection", then your error is one of commission and not just omission. One cannot, even in D&D's system "project away" attributes of a class!)
- Don't start the pedantry of D&D on wiki please. I use SQL for ease of use by the readership of wiki. So how exactly do those views differ from their equivalent table? -- CostinCozianu
- I wasn't trying to being pedantic, I was honestly uncertain if you intended to include projection. I described below how they differ. Assuming SQL isn't particularly helpful, since some of it doesn't map well to either D&D's or the relations-are-classes model. You can't project away the attributes of a class in D&D's system, and it's unclear whether it should be allowed or what it would mean to do so in a relations-are-classes model. -- DanMuller
By equating classes to relvars, you have a problem with interpreting the results of any relational operation that combines relations with different headings, or even different domains for its attributes, or that produces a relation whose heading differs from that of its input relations. These operations would implicitly define new classes. Since OO classes are typically more than just their structure, it's not usually adequate to simply infer a new structure. Constructors, destructors, and other operators of the input classes may or may not apply to the output classes, and nobody has, to my knowledge, tried to come up with a systematic description of how to infer these. In fact, in many host languages, this would be a terribly difficult problem to solve.
- Again, "equating" is loosey-goosey. Let me remind you that D&D do not use the term class at all, they use the term type. Following their terminology it is trivial to see that there's an obvious mapping between table names and type names that for every table A let typeA be the type of the tuples stored in that table. That typeA can be automatically inferred or can be explicitly named by the user, as in the User example which makes a perfect tuple type for Users table. -- CostinCozianu
- D&D discuss classes and objects, but use the term "type" for their own constructs. In a relations-are-classes system, what operators apply to a class inferred from a projection or join? What is the class's name? What operators are implicitly defined for it, and how would you explicitly define new ones? I submit that such an inferred class would, under most systems you could imagine, be more like a simple C struct - a mere agglomeration of fields - than what is usually called a class. The tuple plays this struct-like role in D&D's model. In fact, although tuple types are unnamed, it is possible in a D to define operators for a specific tuple type. So in fact, tuples can play the same role that such inferred classes would. Perhaps there is less difference between D&D's model and the relation-as-class model than would appear at first glance. -- DanMuller
- Dan, maybe you aren't using this wiki lately and this response will go unnoticed but I'll give it a go. The problem with "relations-are-classes" idea, first of all, is that we don't have a clear definition of what a class is! Some people visualize classes as the definition (type specification) of an object (variable). Other people see classes as variables themselves, i.e. using the objects in programs and accessing the class like a variable. I think the important distinction needs to be made between CLASS VARIABLES versus CLASS TYPEDEFS (specifications, interfaces). In CeeLanguage you can declare the struct layout, but the struct variable is not the same as the struct layout (definition). One of the main causes of BrainDamage with regards to OOP, is that OOP was never clearly defined, and existing OOP languages of today still do not make proper distinctions. If OOP had started off with more rigor and precision, OOP would be in a much better state today. Unfortunately OOP did not start out with rigor. The solution to the problem is to add rigor to the fire: we, or someone, needs to precisely and rigorously define the hell out of OOP. Another BrainDamage problem is that many languages muddy the waters quite a bit and don't even really have "typedefs" at all, so a lot of programmers aren't even aware that their dynamic/weak typed language is doing type conversion for them underneath the hood. Dynamic/weak typed languages are not good teaching tools at all because they fail miserably at making important distinctions; yet typically dynamic/weak typed languages are the langauges that kids/newbies start off with, like php.
It is for exactly this reason that OODB systems or APIs impede access to the database; they generally make it very difficult to access the inherent flexibility of a relational database by requiring you to do a lot of additional work to "wrap" a query, or by requiring you to completely sidestep the OO interface for ad hoc queries.
-- DanMuller
- Now you seem to be talking about something else. You seem to be talking about a proscription against mandating in the model that all relations have an associated user defined type, explicitly written down by the user. If that's the FirstGreatBlunder, well, first of alll it doesn't read that way, but in any case that would be a straw man. No OODB model that I am aware of (for example TheoryOfDatabases?, FundamentalsOfObjectOrientedDatabases, the EntityRelationshipModeling book by Thalheim) comes up with this restriction, and this restriction is also not characteristic of O/R mapping tools or any other OO practice that I am aware of. -- CostinCozianu
I'm not sure I understand what you intended here. Are you saying that in OODB systems, it is typically mandatory that a user explicitly define a class to represent the output of every query? This sounds quite onerous. In fact, the in-house OODB system I inherited at work was quite like this. This greatly inhibits the creation of ad-hoc queries, and either slows down development tremendously or encourages programmers to avoid customized queries, thus slowing the
application down. --
DanMuller
- Forget it, I was trying to imply how the FirstGreatBlunder would make sense if at all. That it is it would make sense if directed against a StrawMan that never existed in OODB theory.
However in the spirit of LawOfDemeterRevisited, just like behind the (IMHO) flawed LawOfDemeter lies a grain of truth, such is also the case with the FirstGreatBlunder. What may lead to less than good relational designs is following the mapping of objects to tables on auto-pilot rather than on case by case basis.
While it is OK in 99.9999% of the cases to map the User class to a Users table, the same cannot be said about blindly following an algorithm that says for every class in the OO system that needs to be persisted create a corresponding table.
D&D claim that OO is poorly defined (OoLacksConsistencyDiscussion). If that is the case, then there is no single "right" mapping approach.
- Even if this argument is correct, this does not say that there is not a large number of "wrong" mapping approaches. ;-)
To DougMerritt: quoting Date is not enough. Thinking critically about what Date wrote is different. I just proved that it matters not at all whether you store object information as tuples or whether you store it inside attributes, and now you're criticizing the support for UDT in databases of 5 years ago, which has nothing to do with FirstGreatBlunder. If you actually read TheThirdManifesto you'll also discover a huge invalid critique of the super-table sub-table issue which they criticized as a proposal in some paper but it is actually an (arguably useful) feature of PostgreSql for a long time now. How good was support for UDT in Oracle 5 years ago has absolutely nothing to do with whether FirstGreatBlunder is actually a blunder. FirstGreatBlunder is not about critique of UDTs in commercial system, first great blunder posits that objects should only be stored in attributes, period. And that's just plain wrong. -- CostinCozianu
On the FirstGreatBlunder page, the blunder is very clearly defined to be the claim that "relvar = object class", while Date believes the non-blunder should be "domain = object class". I believe that that sentence is not only true, but irrefutable. Because of that, I then wished to point out that "domains" must exist before Date's advice can be followed. For them to exist means at minimum that users must be able to define new types with associated operations, and that those operations must be usable in queries. I believe this is also true and irrefutable. I furthermore believe (this belief is not irrefutable, but seems true to me) that many widely used RDBMSes today still do not allow users to do so. This sentence might potentially be refuted, but I believe it to be true, and that your statements about Oracle are the exception, not the rule, and thus that many RDBMS systems still do not support "domains" in the Date sense, and therefore one cannot follow Date's advice when using them...but that potentially one could follow this part of Date's advice when using Oracle, since you seem to claim that version 8 and above supports "domains", if I understood you correctly.
These true things that I am saying may well be different than what you view as the central issue of your argument, but so what? Am I not allowed to make a true point on the topic? -- DougMerritt
Can you explain in English what does this mean "relvar = object class", because Mathematics, it ain't.
- Sure it is. Date is using '=' in the classic sense of meaning the left side is identical to the right side (or rather, since he's claiming this identity is a blunder, he's saying they are not identical in this case). He defined the term 'relvar' as a "relational variable" (a variable which has a relation as its value). "Object class" is obviously some kind of OO class; for the moment it doesn't matter whose definition you buy on the subject. So he's saying "relational variables are not identical to OO classes", which I believe is, as Date said, self-evidently true without needing to construct examples. Variables are self-evidently not classes.
- Variables are not classes ONLY if you think that classes are the specification/type. When you actually use the classes in your program, they are like variables. You could call them "class variables". Class variables might be "objects" (instances of the class). This terminology needs to be clearly defined rigorously, otherwise these battles will never end, and we will get no where in ComputerScience. Think about a STRUCT in the CeeLanguage.. It has a type specification defining the struct layout, but it also has a variable instance (using the struct in your program and assigning values to the struct members). So there is a struct VARIABLE, and a struct DEFINITION (type). Objects have methods, and structs don't... Or should I say "classes have methods"? Do classes have methods, or do objects have methods? which is it? This confusion needs to stop right now, it's FuBar.
I just refuted what you consider as irrefutable
- No, because you misunderstood, by not taking me literally enough. I almost never say something as strong as "irrefutable". In this I was irritated that you were making it hard for me to say pretty basic, obviously true things, so I carefully constructed sentences that were literally irrefutable. When I stated a mere belief, I said that, too - that my belief about vendor offerings was not irrefutable, merely my belief.
- I said, "On the FirstGreatBlunder page, the blunder is very clearly defined to be..." and "...while Date believes the non-blunder should be ...", and then I said "that sentence is not only true, but irrefutable", which is true, because, NOTE I didn't say Date was correct, I just said that's what he claims. The FirstGreatBlunder page does say that, this is irrefutable. Date does claim this, that too is irrefutable. Literally. This is not a question about whether he is right, but about what he said. He said that. It's irrefutable that he said that.
- Then I said that it was irrefutable that 'that "domains" must exist before Date's advice can be followed', which is trivially true, since Date's advice requires use of domains. This is the next closest thing to a tautology.
- I don't say "irrefutably" lightly, and I cannot imagine how you could refute things like the above: tautologies, quotations and very light paraphrases of what Date says, etc.
- You seem to think that, if there's some risk I may shortly agree with Date about something you disbelieve, that if along the way I say "2 + 2 = 4" then you'd better contradict me right away before things get out of hand. This is illogical. And for the record, I haven't made up my mind about some of Date's other statements; I'm not a Date fanatic.
- At this point I am tired of trying to force you to believe that the equivalent of "2 + 2 = 4" is irrefutable. I don't argue for the sport of it, I am looking for insights, so that I can continue to learn, and what I'm getting is misunderstandings, at best.
- So although I thank you for inviting me back to c2, this sort of thing is not my cup of tea. I have other things to work on, if I can't be spending my time learning.
- Doug, I am sorry that you get so easy on the wrong track, but I think you are missing some background here. Of course if anyone who says "1=2 is a great blunder", well, that's irrefutable. So is saying relvars are not types, but you are taking it too literally. See, "1=2 is a great blunder" is not interesting and is not publishable. So is that "tables and types are different". Plus that would be a trivial StrawMan because nobody from object world claimed such non-sense, if that was a blunder, well nobody made that blunder. What Date is trying to say using figuratively with the "equations" is that there should be no correspondence between tables (relvars) and object types, but each object type should be mapped to a domain nothing else. And that's what I refuted, and that's in earnest what Date is trying to proclaim in TheThirdManifesto.
- Indeed many in the OO world advocated that "object types" should be mapped to tables (nobody in the OO world said tables=object classes, nor was it said "equating", or any such word but many talked about mapping). Indeed, if you cling to a literal reading of Date you come up with both a trivial truth and an uninteresting one, and one that has very little to do with what Date is saying. And you spoil the conversation as well. Cheers, Costin.
- Spoiling the conversation certainly was not my intent, I just meant to address little issues rather than your overall thesis.
- Thanks for your care, but I'm afraid these are non-issues in the context. D&D makes it pretty clear in the context (the whole chapter dedicated to ) what the gist of the argument is. If you have objects, such as an User object, you should not store them as tuples, you should store them as attributes somewhere. Considering a tuple in the Users table as an object is an instance of the FirstGreatBlunder, according to them, and according to the exposition that I made it makes no difference, one way or the other.
- the similarity between relvars and objects is that we use objects in our programs, which are variables. Consider an object in a program that changes state. That object is a variable, is it not? Variables change state. People map objects to tables because objects have data in them, as do tables! The problem is that "class" and "object" are not clearly defined, which causes a giant shit-fest of semantic wars and unclear muddy water (shit water). Tuples are like objects and structs, because objects and structs have data in them. There are two ways to put objects into databases: put the object into a cell, or split the object up into its pieces and map it to a table. If the object is stored in the cell, then that object is locked in that cell, so it can't be accessed relationally. Putting the object into a cell is kind of like a blob. Lots of stuff inside the cell, but not accessible relationally. Once you map the object to a table and split the object up into its pieces, you can now access that object relationally, where data of the object is now in multiple cells. This issue really needs to be addressed and more articles need to be written to and from Darwen/Date about this subject. It's not as simple as saying "domains are classes, and that is that". There is some overlap that needs to be addressed, and the issue is not finalized. Starting a page called: ObjectOrientationRigor.
- P.S. relvars=types is still not mathematics, and certainly not a equation (at least because it is ill typed). It's pseudo-mathematics and pseudo-English, it is meant to be taken figuratively.
...by constructing two tables, one of which is according to Date's prescription while one violating Date's proscription. And they are obviously equivalent, there's no deficiency that you can show on one that there's not also in the other. According to Jan Hidders from comp.database.theory, it's so simple it's boring. That's your refutation right there.
In an imaginary perfect database system, there can be a useful partial function from relvars to object types (and that, unlike "relvars=classes" is a mathematical statement) without any adverse software engineering effect on the system. The database system can support this mapping (make it easy for the database designer to realize this mapping) with appropriate operators and typing rules.
Yes, Oracle has some workable form of "domains", as Oracle is under no obligation to follow the letter of TheThirdManifesto. TheThirdManifesto in contrast has none as it is not a working formal system, not even a formal system at all.
I just reread this page, and I'm still left with the strong impression that Costin misunderstands the target of TheThirdManifesto. At the time it was written, there were efforts underway to create object-oriented database systems. These were not mappings between relational databases and classes; they were attempts to replace relational databases with something that was thoroughly object-oriented, only borrowing what they could from relational theory. (Or perhaps more generously, they were trying to merge the two paradigms.) In those systems, the attempt was made to replace relations with classes; that is the equivalence which is rejected by D&D. (I have a copy of a book describing a proposed standard for such systems, although I can't lay my hands on it right now; it's referenced at least a couple of times in the appendices of TTM, if someone has a copy of that book handy.)
Support for this interpretation of D&D's thesis can be seen in the SecondGreatBlunder, which is described as the introduction of object identity to databases. Indeed, the OODBMS designs of this time reintroduced pointers to databases. As I recall, they made no attempt to map between keys and object identity; keys became irrelevant.
Costin seems to misinterpret application-specific mappings between (apparently only some) relations and classes as the target of the DateAndDarwen's criticisms. I thoroughly agree that such mappings can be useful and pragmatic. But they are a far cry from a merging of the concepts of relations and classes, which would have to include ways of merging concepts such as polymorphism, object identity, and object containment with relational theory in order not to lose the benefits of the latter.
Here is some fairly direct support for my assertion regarding DateAndDarwen's intentions: http://www.dbdebunk.com/page/page/1706754.htm
-- DanMuller
An important point to be made here (again) is that the supposed refutation is based on the equivalence of objects and relvars as data structures, which is what D&D are really arguing against. The 'refutation' completely ignores an important property of both objects and attributes: opacity. Neither attributes nor objects are data structures. They are represented by data structures, internally, but those details should not be visible, unlike the details of a relation variable, which must be visible. It should be possible to substitute the representation of a given class or domain with any other representation of equivalent behavior (to the extent that this is possible, given the Law Of LeakyAbstraction) without altering the way in which any attribute or object of that type is used.
In other words, what people are really confusing are the concepts of data type and data structure. They are not at all the same thing, even though all data types in a given program (even so-called 'primitive' types) are represented by data structures. (Remember, all types are contextual to how the variable is used; underneath it all, the actual 'type' of all variables is 'signals' in the InformationTheory sense, and 'primitive' types are no more fundamental than any other. (That may be true if you limit yourself to digital types that possess a character basis... in the InformationTheory sense, 'primitive' types (that we know of) are: digital, analog, quantum, and void. But I digress. -- db) Types exist for our benefit, because we as humans need to chunk information in manners which we can use to form larger mental models; all the typing support in our programming languages are just tools to simplify that process, and if a given tool does not do so, one should use another tool. But this is a digression...)
Getting back on track: the actual representation of a data type as actual variables shouldn't be important. If I have a type 'Personal-Name', with a corresponding operation which gives the name as a zero-delimited array of ASCII characters, it shouldn't matter to the ClientProgrammer if the type is actually represented as an ASCII array, a list of separate ASCII arrays representing the individual parts of the name, a length-encoded UNICODE string, a linked list of EBCDIC characters, or a two-dimensional array of booleans representing the Morse Code for the name, so long as the operation gives the required ASCII array.
For that matter, it is important to remember that relations themselves are not data structures - the familiar tabular form is only a representation of the relation, which abstractly is a correlation between two or more interdependent but still separate data. It is a mistake to see relations as data structures, for the opposite reason that it is a mistake to see types as data structures: whereas a type represents a single, coherent concept, a relation represents a connection between two or more concepts but not those concepts themselves. The attributes of a relvar are not themselves part of the relvar, but are referred to by the relvar; they can and in many cases should be represented separately from the relvar. If this were not the case, normalization would be impossible; conceptually, a sub-key is a reference, not a datum in and of itself, even if it also carries semantically meaningful information.
By way of analogy, a Lisp atom which is referred to by a list is not a part of the list itself; the list only refers to the atom, but if the list is garbage collected, it does not necessarily mean that the atom is as well, since other lists may refer to it. That analogy has some weaknesses, admittedly, as it involves a certain amount of meta-level confusion, but that's been rife throughout this argument, on both sides.
In some ways, the term 'relational database' is unfortunate, since relations are not about the data themselves, but about relations between data (and other relations). The storage of a given datum is less important than the context in which that datum exists. -- JayOsako
- My understanding is that "relation" is the mathematical term for "table", and is NOT about "links". But, I'll leave the final word on this to a math guru. --top
- Speaking as a discrete mathematics guru: a "relation" is better described as a set of tuples, and are about associations between values. The important differences between "relation" and "table": relations, being mathematical sets, are unordered (whereas tables often have an ordering), don't have duplication (tables might or might not have duplication), and may be infinite (whereas 'tables' are finite). So saying a relation is a mathematical term for "table" is incorrect even though relations in databases are often implemented as tables. But it is true that 'relations' are not about 'links' between relations or about the context of relations.
I think you use the word 'data' confusingly towards the end, there. I believe relations
are about 'data' in the '
WhatIsData' sense (
DatabaseIsRepresenterOfFacts). A relation is essentially a predicate; a system like
DataLog,
MercuryLanguage, or
PrologLanguage is fairly ideal for describing relations to their greatest extent, while SQL and such does the job for more primitive finite relations (where speed and efficient storage are of greater concern than flexibility). However, relations are not about data structures or data types.
DataStructures and DataTypes are not data; they are simply representations for a more abstract concept traditionally called 'values', which also are not data; e.g. you can represent a string or a number in a DataStructure, but the string or number says nothing at all about the world... not until you put it into some sort of relational context. Similarly, Objects are also not 'data' (in the WhatIsData sense). An object is not a fact. An object isn't even a collection of facts. Rather, by representing an object by use of a DataStructure and some sort of identifier (reference, pointer, primary key, etc.), you represent facts about the object and its behavior. And in an ObjectOriented system, this set of facts is projected into reality as an actual object by the runtime environment. But the object is not the facts about the object; it is only represented by them, which are represented in a DataStructure. Objects themselves truly are little pieces of reality, not reflections or models of it. See ObjectVsModel.
But one can model objects in a relational database in a manner that is conceptually consistent with the modeling approach (no blunders involved) by exposing a relation for every exposed attribute and behavior of that object; an exposed attribute for an object is essentially a 'fact' about the object that is meaningful. (And make no mistake, a 'class' is simply an identifier for a shared (and often invariant) set of behaviors and additional facts about an object. It should not be the case that each 'class' gets its own table.) E.g. if some objects have a 'name' attribute, you could create an NAME(ObjectID,Name-Value (string?)) that represents the fact that some objects have names. One can further denormalize lots of different binary-predicate tables (with one of the components being ObjectID) into one big-arse table for efficiency reasons, so that you don't need to touch ten tables to get ten facts that you often need all at once, and you get what looks very much like what D&D call "The FirstGreatBlunder". (Of course, a good, optimizing DBMS would denormalize for you when it sees it can do so and gain efficiency. There is no good reason to do it by hand.) In addition, one might deal with 'private' data via some sort of amalgamation of hidden Markov-style variables collected into a 'private_state' (which might itself be a relation or micro-database), which is mostly only read or modified by the behavior specs for the object. This would essentially correspond to D&D's proposal that objects be given 'domains'.
If you can get that close to the FirstGreatBlunder without actually blundering, one does wonder how much a blunder the FirstGreatBlunder really is. It seems the only thing missing from Costin's original spec is the 'ObjectID', and the fact that there was a table for just that class rather than a table for objects with an identifier for the class.
class ClassA1 {
public:
int attribute1;
string attribute2;
void behavior1() { code_for_behavior1 }
};
... normalized model (expecting many more classes) ...
create table Objects (
ObjectID oid_value unique
);
create table Class {
ObjectID oid_value unique
ClassName string
};
create table Class_Hierarchy {
ClassName string
SuperClass string
};
class table Class_Behaviors {
ClassName string
BehaviorName string
BehaviorDesc lambda returning procedure
};
(or could do 'With_Behavior1(ClassName,BehaviorDesc)')
(Attributes as individual predicates)
create table Has_Attribute1 {
ObjectID oid_value unique
Attribute1 integer
};
create table Has_Attribute2 {
ObjectID oid_value unique
Attribute2 string
};
(OR All attributes as one predicate)
create table Attribute {
ObjectID oid_value
AttributeName string
AttributeValue dynamic
candidate key (ObjectID,AttributeName)
};
... fully denormalized, possibly by the DBMS, upon review of the various constraints and actual data ...
create table Objects {
// ObjectID oid_value - can be eliminated because it's a surrogate ID and there are no other relations using it
// ClassName - can be eliminated because all objects are of one class
Attribute1 integer
Attribute2 string
};
create table Behaviors {
// ClassName - can be eliminated as implicit because all objects are of only one class
Behavior1 lambda returning procedure // could be moved; has and will only ever have exactly one entry
};
... and look at that! denormalized all the way back to the FirstGreatBlunder ...
The ability to derive facts
DataLog style would still go a long way to making the above convenient to use (e.g. one can say that an object has behavior X if it has it directly (first choice), or if is of a class that has behavior X; a class has behavior X if it has it directly (first choice) or if it has a superclass that has behavior X; etc. - rather than specifying it on a per-query basis). But I digress.
The blunder Costin exposes is a bit of a conceptual error inconsistent with DatabaseIsRepresenterOfFacts (mostly because ObjectID was missing, and is necessary for facts about an object system), but it seems to me that the real 'great blunder' is creating a table-per-class when representing objects. That ends up with a mess of tables, no clean way to find all the different places IDs might be, no clean way of associating objects with their behaviors, etc. When performing an object-relational mapping, one should probably consider the broadest possibility of a dynamic, prototype-based language and style the schema for that (even 'Class' is unnecessary for fully prototype-based languages - just have prototype be another object).
Of course, D&D think that object identity (ObjectID) is the SecondGreatBlunder. When it comes to modeling an object system, they're wrong - object identity is fundamental to such systems. One cannot describe relations between objects or behaviors of objects without some mechanism to uniquely identify the objects. For example, it is provable that some sort of extrinsic identifier is absolutely necessary to describe any simple graph of more than one point. There is a higher, meta-question as to whether one should ever bother modeling object systems or graphs within a relational database, but I don't believe that D&D are the right guys to answer that question... I'll wait for an answer from the ArtificialIntelligence, AutonomousAgent, ExpertSystem? and KnowledgeSystem? people.
[Regarding your view on the SecondGreatBlunder, that "[w]hen it comes to modeling an object system, [DateAndDarwen are] wrong - object identity is fundamental to such systems": DateAndDarwen are referring specifically to relational systems, not object-oriented systems. Although their proscriptions against the FirstGreatBlunder (relvars are not domains, i.e., relation-valued variables are not equivalent to classes) and against the SecondGreatBlunder (there shall be no object IDs) appear in a chapter entitled "OO Proscriptions", the "OO" refers to "Other Orthogonal", which is intended to remind the reader of specifically those features of object-oriented systems that they deem valuable in a relational system, i.e., user-defined types and type inheritance. Other notions of object-orientation - instance, object, and class, for example - are not included in their model. Instead, their relational model is based on the core concepts of type, value, and variable (plus operator, which is not relevant here). There is no notion of stateful or mutable objects or instances. Values, which are as close to OO instances as their model gets, are immutable. Only variables are mutable via assignment.]
- RE: [There is no notion of stateful or mutable objects or instances. Values, which are as close to OO instances as their model gets, are immutable. Only variables are mutable via assignment.] -- Mutability is a matter of perspective. When objects 'mutate' in an object-oriented system, is it really the individual objects that are changing? or are the objects 'immutable' and, rather, it is the entire system and runtime environment that has 'mutated' from one state to another, and has an entirely new set of objects? In truth, both perspectives are correct and equivalent. A mutable relational variable logically implies mutable (and destructible) 'objects' as identified within the variable by some candidate key. (Of course, the RelationalModel itself doesn't have a notion of side-effects, but it also doesn't truly have a notion of relational 'variables' - the relational model focuses upon the calculus and algebra of combining relations, not on their mutation.) Anyhow, I've never much appreciated the emphasis relational promoters put on 'immutability' they pretend exists (because that's what they learned at school and have never given much thought to afterwards). Even if the entire database (the collection of named relations and constraints) went from one state to another as a single, big-arse 'variable', you've lost 'immutability' for the database AND everything you can identify within it. Of much greater concern is consistency and isolation associated with transactions.
- [Encouraging the perception of immutability is essentially a pragmatic issue, rather than a fundamental truth. Making values/instances (appear) immutable simplifies programming, as state change becomes confined only to variables and, in object-oriented systems, mutable containers. Of course, achieving optimal performance may require that internal objects be mutated, but the user effectively doesn't see this, and so conceptually (for simplicity's sake) it doesn't occur.]
- What makes you believe this 'perception of immutability' is pragmatic that wouldn't better be explained in terms of action-semantics of the data manipulation language? And if you think your perception of immutability is 'pragmatic' because it reduces the number of change-points to just the relational variables, why is it not 'even more pragmatic' to perceive relvars as immutable and argue instead that "state change is confined only to" the whole database-state? Choosing 'relvar' as the 'ideal' place for perception of mutation seems to me to be arbitrary, artificial, impractical, inelegant, and inconsistent.
- [The notion of confining state change to the whole database state is perfectly reasonable. DateAndDarwen explore this in Appendix D of TheThirdManifesto, 3rd edition, in which a database is defined as a tuple-valued variable or "dbvar", with one relation-valued attribute for each "relvar". As such, any database update is conceptually an assignment to a dbvar. However, because a practical database language typically works with one database at a time, we can comfortably consider the database to be a collection of relation-valued variables and effectively examine it purely at that level. Why not take a step further and treat relation values as mutable? Because that would introduce a distinction between mutable and immutable values, and therefore an inconsistency between scalar values and relation values, in a model where values of any type should be treated equivalently. I'm not convinced that explaining database language behaviour in terms of action semantics would be better, and I doubt it would be as simple and readily understandable as simply defining all values as immutable and all variables as mutable. Treating all values as immutable reduces the model to simple algebraic transformations plus variable assignment, i.e., a system conceptually analogous to elementary algebra, and otherwise unencumbered with value mutation and/or object state.]
- Oh, yes, it IS reasonable. And the fact that it is reasonable makes it unreasonable (arbitrary, artificial, impractical, inelegant, and inconsistent) that you decided to stop at 'relvar' originally. And it is also 'unreasonable' to stop at 'dbvar'. Go all the way back to 'the universe is immutable' and be wrong every step of the way. Then go the other direction: the database-state can be changed, and thus the database is mutable. The 'database:relvar'-state can be changed, and thus the relvar is mutable. The properties of the object identified by 'database:relvar:candidate-key' can be changed, and thus the object so identified (which just happens to be a tuple) is mutable. Database-states (i.e. state-snapshots) and relation-values aren't mutable, but databases and relations in databases are mutable nonetheless. The object identified by 'databasename:tablename' is a relation, in particular a relation-object with an object-identity ('databasename:tablename') and is mutable, and is not a relation-value (though it has a relation-value as its 'state'). A relation-object is mutable because its properties (its state; in particular, the relation-value it carries) can be changed (for different relation-values) over time. And the same is true for tuple-objects (identified, for example, by candidate key) and their tuple-value state - the values identified as part of the tuple that weren't part of the candidate key. So my claim is NOT that 'values are mutable'. It is that 'objects in a variable system as distinguished by an identifier held by an observing agent are themselves (often) variable and mutable', and that you're kidding yourself if you think a live database, its relations, and the tuples in said relations do not constitute such a system.
- [You wrote, "... relation-values aren't mutable ..." and "... my claim is NOT that 'values are mutable'". Then we are in agreement. Within the relational model, state change is defined purely in terms of assignment of relation-values to relation-variables (in the programming language sense). If the result of this is that the observing agent perceives mutability of the database as a whole, or perceives mutability of particular relations or tuples (e.g., I retrieve a tuple with the 'Name' attribute value of 'Alice' from the Employees relvar, and note that his 'LifetimeInSeconds' attribute has increased since the last time I retrieved such a tuple, and so I say Alice is older by virtue of having retrieved a tuple that says he's older), then the relational model is working as it should. It then doesn't matter to an external observer that within the model we treat the Employees relvar as having been assigned a new relation-value to indicate that 'Alice' increased in age. I suspect at least some of this debate may result from a differing point of view. I'm looking at it from within the relational model, in which tuple-values and relation-values do not change; the only change is what is contained in a given variable. You, I suspect, are looking at it from the point of view of an external observer, where a given relation or tuple referenced by a given variable can be said to change.]
- We've reached agreement on at least one major point. But I'll argue that the 'external observer' is the only POV that matters when it comes to mutability; in a very fundamental sense, there are no 'objects' within this universe unless an observing agent bestows upon a collection of properties an 'object identity'. Thus, without such an observer, no 'thing' ever changes by virtue of there being no 'thing's. Ultimately that makes the relational model's approach ([Within the relational model, state change is defined purely in terms of assignment of relation-values to relation-variables (in the programming language sense).]) an at-most convenient fiction for explaining the model to people and shattering the fixed mindsets of those more comfortable with reading and writing and creating and deleting individual cells. (The claim that state change is purely assignment of relation-values to relation variables is even more fictional when considering that few RDBMS truly operates in this fashion - non-interfering actions can read from and write to different parts of the same table when using indices and 'update where' clauses; this ultimately means that, even absent explicit transaction, one isn't really reading one relation-value, calculating another, and writing it back). Even if it were true, a claim that the data manipulation language semantics only 'directly' supports manipulations at the coarse-grained level of whole relvars simply complicates the protocols for 'changing the database as a whole' or 'changing a single tuple-object', much as actions are complicated for setting multiple machine words or one bit within one word if it can only be read or written one machine-word at a time. Given macro-support and such, any relational language could be casually extended to support these actions with direct phrases. The language does affect how you go about expressing your intent to change things (and what sort of intent you can express directly and indirectly), but what really matters are how these different actions interact with one another and failure semantics, which are ultimately the purview of transactions. Ultimately since the external observer's perspective on mutability is the one that matters, it may easily be that an observer chooses to view mutation as the relational model says he should - but doing so because that's what the creator of the relational model chooses is pretty much treating the relational model as a religion: you are accepting its 'ideals' as your own.
[The
SecondGreatBlunder and its associated proscription (no object IDs) is intended to address what they believe to be a flawed approach in
implementations of the relational model: the association of opaque identifiers with values, such that two values can be considered unequal or distinct even though their semantic content, i.e., what they represent, is equal. If this is permitted, we can have (among other problems) the perverse situation where value "3" of an Integer type in variable X can be considered unequal to the value "3" of an Integer type in variable Y, merely because the two "instances" (ugh) have differing object IDs. The same concept extends to tuples representing real-world entity instances. Would it be meaningful - in data management terms - to have two otherwise identical "Alice" employee records that differ only by an opaque, internally-generated "object ID"?]
- Well, when worded that way, the objection to that use of identifiers certainly makes sense. Identifiers should be explicit in the model, even if they are explicit but opaque surrogate identities that never exit the context of the database. Technically, one should have the 'identity1' and 'identity2' referencing a table that has two pairs (identity1,3) and (identity2,3). One could collapse it in a view, of course... and if you're allowed to update views, it might be reversible to modify the appropriate slot.
- I believe that 'object' is just a way of looking at things, and what constitutes the 'object' is described by its 'object identity': that which you (as a thinking agent) choose to identify as being part of the object. You can describe the changes in a particular tuple when you uniquely identify that tuple (which ultimately means that the tuple is 'mutable' if the relation you find it in is 'mutable' as described by your tuple identifier... which is probably 'databasename:tablename:candidate-key'). You can describe changes and behaviors (patterns of changes) in a whole set of tuples if you can identify that set of tuples. You can describe the changes in a lake or a cloud or a car or a block of memory when you can identify these, too. Thus, to my understanding, Object IDs are in the relational model (or, more accurately, those thinking about the relational system) whether you want them or not - nothing you ever do (short of controlling who is observing the system) can be rid of them. But Object IDs need to be modeled explicitly (e.g. via ObjectID column) when modeling object systems because identity is fundamental to object systems. However, I do agree that hidden identifiers for rows or cells that affect the semantics of your language are bad things. (Hidden identifiers are okay if used internally for optimizations and such.) Forced exposure to such identifiers is also no good for an entirely different set of reasons (see my rant on 'leaking' surrogate keys, below). -- db
- [I agree that notions of identity - or at least of unambiguous identification - are fundamental to both the relational and object-oriented models. Indeed, the internal machinery of a relational system needs to maintain the same degree of object identity as any object-oriented system. This is necessary in order to identify and select just that particular disk page, tuple, container object, data structure, type definition, or whatever. Such objects would not be identifiable otherwise, and are only accessible by a memory address or equivalent object reference. However, a relational system abstracts this away from the user's view, so that "object" identity in a relational system differs from an object-oriented system's notion of identity. In a relational system, the notion of that particular tuple or the desired tuple or the tuple of interest (for example) is defined in terms of tuple attribute values and predicates, such that any tuple matching a given predicate is the desired tuple. However, from an strong object identity point of view, it may not be the "same" tuple - i.e., precisely that tuple occupying a given storage/memory address, or holding the same object ID - as it was when it was created. In effect, we are more interested in "soft" predicate-based notions of identity than "hard" notions of identity based on some absolute internal reference. By way of example, in the algebraic expression "3 + 3 = 6" we are not concerned with which 3 is on the left side of the '+' operator, and which 3 is on the right. Indeed, the notion of "which '3'" is meaningless. While the first '3' and the second '3' may occupy different addresses in a computerised algebra implementation and therefore have distinct object IDs, from an algebraic point of view we don't care - all '3's are the same '3'. Similarly, in a relational system we are not concerned with distinguishing TUPLE {x 5, y 'zot', z 2.5} from another TUPLE {y 'zot', z 2.5, x 5}, because they are the same tuple. If they must be distinct (or unequal) tuples, then we will intentionally introduce an attribute to make them distinct, which will presumably have some meaning in the external world being modeled by those tuples.]
- RE: [However, from an strong object identity point of view, it may not be the "same" tuple] - and this is where I'll have to disagree with you. There is nothing in object oriented systems or design that has ever required "same" to mean "same storage/memory address", and identification based on spatial properties really don't make for a 'harder' predicate than anything else you might choose to utilize. Strictly, from a "strong object identity point of view", the tuple IS the "same" because it was identified with the "same" object identity. However, some of the tuple's properties and attributes may have changed since you last observed it. E.g. the tuple may have different contents in the cells that weren't part of the candidate key (which is obvious and expected), and it may have moved to a different memory/storage location (which is a less obvious property because it is not exposed to us), or it might even be gone entirely (object-identity always allows for the possibility of non-existence). Even if the candidate-key is the whole tuple-value, it still has the relevant property of 'to be or not to be' each time you attempt to observe it. You are accustomed to thinking of spatial locations as object identifiers because that is what is common in many ObjectOrientedProgrammingLanguages, but you would do well to note that this is an accidental property that is common only due to the structure of memory and processes in modern computers; it is not a necessary property of object oriented programming systems (and you wouldn't expect such a property for, say, Corba objects). Further, consider that even "objects in memory" might not be in the same physical place in memory from one moment to another... not when you start dealing with virtual memory, caches, etc.
- [From a user's point of view in a relational system, the notion of (for example) tuple identity is defined purely in terms of the tuple's attributes. Therefore, to say "some of the tuple's properties and attributes may have changed since you last observed it" suggests that you believe there is some enduring "it"-ness about tuples that is preserved from one set of attribute values to another. This sounds like a philosophical issue, described in FredsAxe, rather than a real concern. In the absence of an explicit tuple ID, which most relational systems do not have, I'm curious what a tuple's "it"-ness - i.e., an identity apart from its attribute values - would be, from a user's point of view. I'm also curious what relevance that would have to a user. What useful value do we derive from the knowledge (presuming it is obtainable) that this is the "same" tuple we retrieved ten minutes ago, but all its attribute values are different? Furthermore, what do we lose by regarding it as a different tuple?]
- Values have intrinsic identity. Objects have extrinsic identity. One can observe that the properties of the tuple-object identified as "databasename:tablename:candidate-key" have changed, not that a tuple-value has changed. The tuple-object is the it in in the context of the above discussion. You have the "same" tuple-object because it was identified with the "same" tuple-object-identifier. If this was unclear in the above context, I apologize. And the FredsAxe issues you denigrate as 'philosophical' are actually quite fundamental and of very real, very practical concern. We understand time in terms of what we can identify and observe as having 'moved' or 'changed' between one moment and another despite the fact that we can't even prove time is continuous or that there is any real association at all between a prior observation and a later one. Because we understand the world in this manner, the value of identifying temporal variations of an 'object' one has identified is obvious when we must communicate changes to other entities who understand things in the same manner. E.g. a soldier, or a robot, needs to know that you've made a change to its current mission, not that you've created an entirely new mission with slightly different parameters that it must execute from the beginning. You might not think too often about the communications side of things, but that doesn't make them irrelevant.
- [What if the candidate key has changed?]
- The intended meaning of 'candidate key has changed' is unclear, but there is only a finite set of possibilities:
- (1) If the candidate-key in the tuple-object identifier (database:relvar:candidate-key) was changed, then you don't have the same object-identifier, and therefore you are no longer talking about the same object.
- (2) If you removed a tuple identified by (database:relvar:candidate-key) and put in its place a tuple that is not identified by (database:relvar:candidate-key), then it is technically incorrect to say that the 'object' identified by (database:relvar:candidate-key) has 'changed' its candidate-key. It simply no longer exists.
- (3) If you change the database by removing the constraint that requires the given 'candidate-key' to be unique (such that two tuples may eventually exist within the relvar with said ex-'key'), but don't otherwise manipulate the database, then you still identify the same tuple-object ("the tuple identified by (database:relvar:ex-candidate-key-spec)" is still unique). Potential ambiguity is okay in object identifiers, and is not generally considered an issue until it is actualized. E.g. "the cat in my bedroom" is an identifier for the one and only cat in my bedroom (should a cat be there), and isn't a problem until such a time as there is more than one cat (at which point the potential ambiguity becomes real ambiguity). If there is no cat in my bedroom, then "the cat in my bedroom" references an object that does not exist. Question: If "the cat in my bedroom" crosses the threshold of my bedroom door and exits, where is "the cat in my bedroom"? Answer: It doesn't exist. There is no cat in my bedroom. There is a cat that was recently in my bedroom, which is a legal identifier... but not a usable one. Of course, humans don't typically identify cats by their spatial locations; spatial locations are notoriously awful object identifiers for things that easily move around.
- [Sorry, I meant what if the candidate key value of a given tuple has changed. Is it the same tuple, or a different tuple? More importantly, does it matter whether it's the same, for any reasonable definition of "same"? I would argue that it does not. What matters is whether or not we have a tuple matching a given key value or other attribute, in the same manner that (for example) a given numeric variable either contains a '3', or it does not. We don't care whether or not it's the "same" '3' (or the same tuple) we retrieved ten minutes ago, because it is the value's (whether scalar, relation, tuple, or something else) intrinsic identity that matters, not its (inaccessible) extrinsic identity.]
- What matters in any system is the value (in the $$$ sense) others attach to it. ToDefineIsPerhapsToValue?. What matters here is that a change of state has occurred to an object that some agent considered worth defining with the identity 'database:relvar:candidate-key'. You keep arguing that it doesn't matter whether it's the "same '3'" retrieved minutes ago, but it does matter that the '3' was retrieved from the same 'tuple'-object that it was retrieved from minutes ago... saying otherwise is equivalent to claiming that "it doesn't matter from which variable you drew the '3', it's still '3'!" You are correct that it's still '3', but you are incorrect that the source doesn't matter. It matters to someone more than you think. Think about the communications context in which databases are found and given value, not just about their internal structure.
- [What if all attributes of a tuple have changed? Is it the same tuple, or a different tuple?]
- Different tuple. You would not identify the new tuple by the 'same' object-identity (database:relvar:candidate-key) which involves the candidate-key value of the tuple you originally referenced. Technically, it is incorrect to say the original tuple 'changed' except insofar as it 'ceased to exist'.
- [If it's the "same" tuple, how do you know? The soldier/robot analogy is interesting, but irrelevant. A tuple is not a mission that must be re-executed from the beginning (??? *boggle*), but a representation of a fact.]
- A mission is also a representation of facts and goals. Each statement in a mission is either a statement of the condition one should expect, or a statement of a condition one should effect. But, really, the soldier/robot analogy is relevant primarily to explain the issues involving communication.
- [Yes, of course a mission is a representation of facts and goals, "buddy". Please don't be rude - it's unnecessary. If you are unable to commit to maintaining a civil exchange, I will withdraw from this discussion.]
- I apologize. I should have had the strength of character to patiently explain why something is relevant to a person who in ignorance has dismissed a point as irrelevant without even asking for clarification.
- [It is rare that a system underpinned by a relational database needs to know that what used to represent fact A is now (in fact) representing fact B. In the vast majority of cases to which the relational model is applied, it is sufficient to know whether or not fact A (or fact B) holds true. However, where needed, such changes are trivially modeled as a log of historical facts (the candidate key would probably be a timestamp) or by introducing an auto-generated key to permit identifying just that tuple, regardless of what other attribute values may change.]
- It is not at all rare that a system backed by a database needs to know of 'changes' to certain facts, at least not in domains that need to support communications between distributed agents. However, it is true that these are rarely supported by 'relational' databases... largely because the RDBMS utilities are currently piss-poor at providing the necessary communications support. Instead we end up with half-baked databases supporting half-designed publish-subscribe models, in which facts are published... and subscribed by what is essentially 'object identity'. If you're interested in RDBMS technology sweeping a new field, make it so it can efficiently allow subscriptions to arbitrary queries (so that just the updates to said query-results are communicated). You could probably make a lot of money, but it does involve tracking communications contexts and all sorts of other nasty stuff, and to be of real use it needs to both make HardRealTime and stability guarantees (e.g. for avionics, vehicles, etc.), have frequency, bandwidth, and timeslot options, and probably needs to package up messages for different subscription-updates that are going to the same destination into unit packets (because packets cost a lot more than just the bandwidth and time required to send them). I.e. there's a massive load of properties people would pay for.
- And keeping a historical log is useful for many things (simulation replay, graphs of historical status, and lots of stuff), but not really for communications support of the sort being described here. I also note that timestamps make for nigh-useless candidate keys, but (technically) they don't need to be the only such candidate-key. Since 'object' is just a way of viewing things, any identifiable set of properties can be part of an arbitrary object described by said object-identity.
- [Actually, I'm not particularly interested in RDBMS technology sweeping a new field, at least not yet. I've got plenty to do, as it stands, in my own small and humble corner of the relational world. I'd be quite happy if truly-relational DBMS technology (as opposed to SQL) would sweep - or even make significant inroads into - the fields where SQL DBMSes currently hold sway.]
- I'd also like to see 'truly-relational' DBMS technology (with infinite relations, set-based rather than table-based, possibly akin to DataLog and MercuryProgrammingLanguage? but with mutable relations) could do as you suggest. Still, I'm working in that other corner where I'd really, REALLY love to have some DBMS technology that doesn't suck lemons when it comes to rapid micro-sector updates being packaged, queried, recalculated, and distributed to multiple different agents. It just so happens that I'm the one responsible for maintaining the 'half-baked database supporting half-designed publish-subscribe models' for my current project.
- [In other words, where certain FredsAxe issues are relevant, they are dealt with at a modeling level, and need not be dealt with at some intrinsic level within the relational model itself, beyond that already provided. The current model is not just sufficient to model the real-world scenarios for which relational DBMSes are typically used, it is effective as well.]
- Oh, I agree that these issues do not need to be dealt with 'within the relational model itself'. But that doesn't justify pretending the issues don't exist and choosing instead to idealize some fictionary 'immutable' relations. Embracing that mutability and 'mutable object' is just a matter of perspective is the first step in recognizing the needs of a whole field in which the relational DBMSs are NOT "typically used" and NOT "effective", but could be.
- [I'm not sure why you call immutable relations "fictionary", because under the relational model they are immutable. An update to a relational database consists of replacing the current relation-value in a relation-valued-variable with another relation-value, in the same manner that an update to a simple numeric variable consists of replacing the current numeric value with another numeric value.]
- One might argue that the numbers on a digital clock never change because numbers are values and values are immutable. I've never known someone to typically use English in that manner, but it is a reasonable argument. One might also argue that the numbers on a clock do change, noting that when you look at it now and later, the numbers on the clock are different. In the former sense, relations (relation-values) do not change. In the latter sense, relations (relation-objects within a database) do change. When I say the 'immutable relations are 'fictionary' in a relational model or relational database, it is in the latter sense. I also happen to believe the latter sense is more consistent with regards to modern use of the English language.
- [Execute the following: x = 3 + 4. Now execute x = 2 + 2. Has the 7 assigned to x in the first statement been replaced with a 4, or has the 7 been mutated into a 4?]
- Has the system containing the 'x' mutated? Yes. Assuming traditional binary representation, has the representation underlying the value of 'x' mutated in the 2^0 and 2^1 bit-placement? Yes. And, most relevantly, has "the Number identified by 'x'" mutated? Definitely yes. It is the latter which allows a reasonable claim that a number has mutated. None of this requires '7' (the value) now possess identical semantics to '4' (the value). Did you present a FalseDichotomy intentionally? It would, of course, also be correct to say that the value assigned to x has been replaced by another value assigned to x, which is also to say that the value assigned to x has changed, which is also to say that the value (context implicit) has changed.
- [Thank you. I asked the question in order to gain a better grasp on your definition of "mutated" in the context of assignment to a variable. I'm not sure I entirely agree with your position, but I recognize that the disagreement owes more to our working definitions than some fundamental difference in viewpoint. I had suspected there might be a significant LaynesLaw element to this, and there was.]
- [As a result of this, the notion of mutating a particular tuple, such that it has the same object ID but different attribute values, is effectively meaningless. It is as meaningless as mutating a given '3' so it becomes '5'. Since a tuple is identified only in terms of its attribute values, mutating an attribute value in a given free tuple is precisely equivalent (from a user's point of view) to discarding an old immutable tuple and preserving a new immutable tuple. Similarly, mutating an attribute value in a given tuple in a relation is precisely equivalent (from a user's point of view) to creating a new relation consisting of all tuples in the old relation, MINUS one old immutable tuple, UNION one new immutable tuple. Of course, an actual relational implementation will optimise this by freely mutating tuples, relations, and various user-invisible data structures as needed, but the user is thankfully spared all the gory details and benefits from the conceptual simplicity of being able to treat tuples and relations as immutable values - just like we do with numbers in numerical algebra.]
- RE: [the notion of mutating a particular tuple, such that it has the same object ID but different attribute values, is effectively meaningless] - That is only true if the notion of 'mutating' is 'effectively meaningless', period. Is the notion of mutating a relvar "meaningless" because you can view (and even logically work with) the whole of the database-state as one massive value? Is the notion of mutating a database meaningless because you can conceptualize the whole of the universe as one unique and constant 'state' that is either held at a particular moment in time, or a constant four-dimensional state varying in space and time? I say it is not meaningless. A user can legitimately view a tuple as changing in time, and can even create tools that carry out the necessary protocol to execute the change. And 'MINUS/UNION' aside, there is always a protocol to change stuff in a computer system - to flip a bit, you grab a unit of the smallest directly addressable size ('byte'), bit-twiddle the correct bit, and send it back to memory. To change a byte, you might need to grab a whole 128-byte block of memory to a secondary cache, grab a 32-byte block to a primary-cache, grab an 8-byte word to a register in memory, modify the value of said byte, write it back out... ultimately changing a whole 128-byte block with another one. Was it 'meaningless' to say you were changing the 'byte', which is just part of that block? As far as your claim of 'simplicity' goes, I don't believe it is provided by the 'view' of mutability; it is provided by transactions. Absent transactions, one would still be concerned about how his or her manipulation of a database interacts with that of other users, whether it breaks consistency with other tables, etc. And once you DO have transactions, THEN you have simplicity... but all things are simpler; even making multiple updates to 'mutable tuples' in many different tables is 'simpler' with transactions because you're spared all the gory details of how to make these changes without breaking consistency, without interfering with other users... and if it is Atomic as well (which is only relevant in the event of partial failures... which aren't part of the relational logic) so much the better.
- [I'm not clear what you gain by regarding relations as mutable, or what you lose by regarding them as immutable. I agree that fully-isolated transactions do simplify code, but that is a separate issue.]
- The only thing you lose by considering database relations (relation-objects identified by database:relvar) and database tuples (tuple-objects identified by database:relvar:candidate-key) to be 'immutable' is consistency with logical thought. Not a big deal, eh? It is not rational to simultaneously hold a belief that something you've identified as 'database' is mutable and that a relation you've identified as a 'database:relvar' or a tuple you've identified as 'database:relvar:candidate-key' is not mutable - not unless these are constrained by some invariant (e.g. being read-only).
- [The 'database' isn't mutable per se. Only variables are mutable, though it is accurate to consider a database to either be a collection of variables or single variable containing a tuple - but then it's the database variable that is mutable, not the database itself.]
- A database, per se, is both 'variable' and 'mutable' (unless, of course, it is read-only and, therefore, invariant). A dataset is an immutable value. A database-state is an immutable value (and differs from dataset by additionally including logs, indexes, constraints, triggers, stored procedures, etc.). Perhaps we're running into definition issues, here? I consider a 'database' a living entity that is being updated, backed up, mirrored, queried, etc. via support of a 'database management system'. You could send someone a copy or clone of the database - coming with a 'dataset', which is a partial snapshot of the database - but you would now have two databases that would (if one is actively being updated) invariably diverge. Or perhaps I'm misunderstanding, and you mean to claim that systems with variables aren't automatically variable themselves. If that is the case, I feel need to disagree - you don't need to create a named 'dbvar' for a database to be variable; if "only variables are mutable", it is quite reasonable to claim that any system containing a mutable property is necessarily a mutable system, and therefore variable.
- [I'm using "variable" in the programming language sense (as do DateAndDarwen in TheThirdManifesto), i.e., a named, typed, mutable container of a single value. I consider a 'relational database' to either be a collection of relation-valued variables, or a single tuple-valued variable with relation-valued attributes. Other database models have their own definitions, but a suitable general definition of 'database' is a "collection of computerised records" or in a more general sense, "a collection of records". Does this clear up the definition issues?]
- 'Variable' in the programming language sense has never had a broadly accepted definition, and has often meant something other than a 'container'. In various languages, variables can be anonymous, associated with other variables, associated with analog inputs, and many things other than the narrow definition you gave (though they all agree that variables, when dereferenced at different times, have at least the potential to return different values). Anyhow, since your claim is that variables are 'named, typed, mutable containers of a single value', I must point out that your accompanying claim that "only variables are mutable" seems to be in error. And for this context, I'm happy sticking with the relational definition of a database, though I feel that your implicit exclusion of the constraints and meta-stuff is a mistake. My own general definition of a database would be more akin to "a representer of facts", which makes the semantics of the records quite relevant, too (since values aren't ever facts if you divide them from their semantics).
[This does not, however, preclude using relational systems to house and manipulate information about
external object-oriented systems, as one might do with an
ObjectRelationalMapper. The Blunder lies in
depending on
internally-generated object IDs
within the relational system. There is nothing in the model that prevents the user from defining table/relvar/relation attributes that contain object IDs generated
externally. In this case, an "object ID" is merely another attribute of real-world entities (which may or may not be object-oriented system objects), and of no greater or lesser significance than any other attribute. Thus, the relational model can be used to represent graphs, etc., on behalf of an object-oriented (or other object-id dependent) system without causing the
SecondGreatBlunder or violating its proscription.]
- I dunno... I think there is a very good argument for surrogate keys, which are exactly what you describe: "internally-generated object IDs within the relational system" that are intended to never leave the relational system. (AutoKeysVersusDomainKeys) Using arbitrarily generated keys external to the system may easily be more harmful. Consider representing a simple graph - one that doesn't have names or numbers for the vertices. If you are forced to name the vertices external to the database in which you are representing it, you don't have the same graph anymore; you instead have a graph with named vertices. If all you have data on is the structure of the graph, I'd think that is all your database should represent to you.
- [I take your point, but DateAndDarwen objected to what at the time was an increasing reliance on object IDs as the sole way (or at least the primary way) of retrieving specific values, rows, instances, etc. This harks back to hierarchical and network databases, i.e., navigational systems, which were soundly rejected in the 70s and 80s with good reason. The emergence of object-oriented databases intended as a replacement for relational systems were, in DateAndDarwen's view, a return to navigational systems, and therefore bad. However, surrogate keys are a different kettle of fish. Surrogate keys do tend to "escape" into the outside world, in the form of VIN numbers (to uniquely identify vehicles), SIN/SSN/NI numbers (to uniquely identify humans for pension, payroll, tax, etc., purposes), SKUs (to uniquely identify inventory items), serial numbers (to uniquely identify individual products), and so on. It would perhaps more accurate to describe these as computer-generated or auto-generated keys than "surrogate" keys, because they tend to become natural keys. Deliberately introducing an auto-generated attribute as a primary key on an as-needed basis, where no natural key exists, is perfectly reasonable. It is not the same as automatically creating an object ID - a unique, user-visible, value for every tuple/row in every relation/table - and voluntarily using that as the primary key in every case. Or, worse, requiring that such a key be used. That effectively turns a relational database system into a network database system.]
- As a distributed systems guy, I tend to view the world as being broken into communicating systems that each have their own idea as to what the world is like - each possessing their own miniature database. While surrogate keys can be made into artificial keys like VINs and such, it's important to recognize that these artificial keys have very different properties than do the surrogate keys when it comes to communications and security issues: authority to assign a number, namespace collisions, resolving cases where one object ends up with multiple names, distribution of names, control, etc. all become huge issues. I tell you this only because I want you to understand how much I cringe at this sort of problem being introduced accidentally... by 'leaks'. A database system that supports surrogate keys really ought to do so without loopholes through which they can 'leak'. Keeping object-identifiers secret is the beginning and the end of encapsulation - that which you can hide, you can change or virtualize. The moment you start releasing identifiers to the public, you need to keep them around because they might be involved in future communications. I consider leaking surrogate keys just as bad as you consider 'automatically creating an object ID for every tuple/row' then exposing this to users of the system simply because the implementation uses it for optimization and indexing purposes. But I really digress at this point; perhaps we ought to steer this back on topic.
[A similar argument may well apply to the
FirstGreatBlunder. It appears intended to justify a proscription for implementing relational systems, such that a sound relational system should not equate variables with types. It does not appear to be an injunction against using relational systems to house
external object images. Whether to map an external client-side Employee class (for example) to a server-side table/relvar, or map it to a server-side attribute type, is merely a conventional ER modeling problem that should be decided on the basis of requirements rather than a general rule.]
-- DaveVoorhis
It occurs to me that this is looking at the problem back to front. Date isn't talking about the design of databases, but of database engines. What he's arguing for, first and foremost, is for RBDMS designers to implement the full CREATE DOMAIN spec of the SQL standard. It's equivalent to the arguments put forward in the 1960s and 1970s in favor of including structured data types in programming languages. -- JayOsako
The complaint that most RDBMS lack the ability to create user-defined types is common and should be addressed. However, if columns can be tables and tables can be columns, then the complexity and trace-ability of such a system may be called into question. Standards are useful as much for what they don't allow as much as what they do allow, and if they allow everything, then they are not really standards, but merely a canvas for chaos and MentalMasturbation. -t
{The only MentalMasturbation going on here is your idle, ill-considered, and unsubstantiated speculation. If you don't like user-defined types, don't use them.}
Sir, I did not say that I didn't like user-defined types.
{Then clearly you did not express yourself clearly. I don't know what you're on about with "columns can be tables" and "tables can be columns".}
You are right, I should have linked to reference info. My bad. I'll be back.
JanuaryZeroEight