Semantic Binary Model

A database model proposed by Dr Naphtali David Rishe.

See:

http://hpdrc.cs.fiu.edu/documents/semantic/ (ISBN 0-07-052955-8 )
http://citeseer.ist.psu.edu/rishe93methodology.html (this reference needs to be explained!)
http://portal.acm.org/citation.cfm?id=636814 (is this the same model?)

(Some chapters of the book are much more readable in the Postscript files than in HTML.)

From the second reference above:

The semantic binary database model represents information of an application's world as a collection of elementary facts of two types: unary facts categorizing objects of the real world and binary facts establishing relationships of various kinds between pairs of objects.

The distinction needs to be made between representation model and database model. Because not every represenation is automatically a database. Most of the discussion below argues about the virtues and relevancy of Semantic Binary Model for database. Was it only targeting database technology, or was database model just one example of more general representation priciple? Answer to this question will make many of these arguments irrelevant. For example, Relational Algebra mostly applies to database technology as a tool, not as representation theory. On the other hand, binary relations are in a very foundation of some mathematical represenations like Allegories. Comparing the two makes no sense in this broader context.

One of the major advantages of the relational database model, as compared to the network and hierarchical models, was the independence of the logical data from the physical aspects of data storage. The semantic models went one step forward towards the independence of the actual information from its logical data representation. Among the semantic advantages of the semantic binary model relative to the RelationalModel are the following:

All the information is composed of the elementary facts describing the real world, so no normalization of a semantic binary schema is needed;
No category (type) of objects needs to have a key. A key is collection of attributes which are never null and which universally identify the objects of the category. (Instead of having a fixed inflexible key, in the semantic model different objects of the category may be identifiable by different attributes or by different relationships with objects of perhaps other categories. In the real world, keys exist very rarely. Another practical requirement of the keys in most systems is that their values do not change in time. (Even if the system does not guard keys against change, arbitrary changes may cause inconsistency of the database.) On the other hand, all natural attributes should be allowed change in time: if not because of the changes in the real world then because of correction of erroneously recorded data.)
Objects are not logically replaced by their keys, when these exist. So a value of a key is changeable with no influence on the other information about this object in the database.
- Actually the above two properties are a weakness of semantic binary model. See RelationalHasNoObjectIdentity. The absence of a feature may be a design virtue as in languages not supporting GOTO.
  - The position stated at RelationalHasNoObjectIdentity is essentially that it's useful to allow identifying objects by values (no argument; the SemanticBinaryModel does that easily), and that it's not useful to allow -- in fact, that it is beneficial to disallow -- identifying objects by reference semantics. The latter is simply asserted, not argued.
An object may belong to several categories simultaneously.
Properties which are common to several categories, can be specified just once.
It is conceptually simple and schemas can be easily explained to owners of the information to be stored in the database, who may have no computer knowledge but must approve the conceptual schema.

It miraculously solves every problem in an Oo programmer's mind.

Oh, don't worry; I can create more problems. ;->

This looks like a restatement of the entity-relationship modelling scheme. These mappings are well known, but it is nice to see them in one place.

You're pretty quick at minimizing significant work of others who didn't think along what you already know :) It is not a restatement of entity relationship. This is a complete database model, thought out in every little detail.

As Rishe already says in the introduction, several semantic models do exist that have a high similarity. CostinCozianu

Yes, but it doesn't say that OO == relational, just that they are similar.

This is very interesting indeed. Unless, Rishe's model is somehow flawed, which I don't think is the case, it can be deducted from there that ObjectRelationalImpedanceMismatch does not exist. Indeed Rishe's SemanticBinaryModel is everything OO people have ever dreamed of in a database. And Rishe's semantic binary model and the relational model are equivalent, up to an isomorphism. -- CostinCozianu

That is not true, at least not in the normal sense. Rishe's model includes all the elements from every model. As such it is a superset of the relational and object-oriented (among other) models. This does not make relational and OO models equivalent, it just allows for a mapping between the two. Most automatic persistence layers use a similar intermediate representation which allows interconversion. Without exception, however, additional metadata/metafunction needs to be included to guide the transformation i.e. the intermediate representation will include more information than any more specific representation.

Well, in case you really don't get it why they are isomorphic, it is because a binary database schema is a relational database schema by the force of nature. A binary relation is a relation , it's just as simple as that. 'I disagree, a subset by definition does not include all of a superset otherwise they are not distinct -- RIH'

A relational binary schema is necessarily a relational schema. However, a SemanticBinaryModel schema is not necessarily a relational schema. So technically, the SemanticBinaryModel is incommensurable with the full RelationalModel. But in practice it is easy to translate between relational binary and relational n-ary schemas, so the SemanticBinaryModel can effectively be viewed as a superset of the RelationalModel. -- DavidSarahHopwood

On the reverse side, every relational schema can be expressed as a binary schema without any loss of information. 'I agree, a superset by definition includes all of a subset. Do you see where I'm coming from? -- RIH'

A more extensive discussion needs to be made about domain (types) and constraints. It's doing injustice to the subject to treat it the form of this html page, but if you read TheThirdManifesto, and use a little bit of imagination and mathematical thinking, you'll see they are equivalent up to an isomorphism. -- CostinCozianu

Perhaps we are stuck on the definition of impedance mismatch. It is usually a measure of the effort needed to convert between two representations. An infinite mismatch is possible, but is a degenerate case. -- RichardHenderson

Well, about the conversion effort as a measure of ObjectRelationalImpedanceMismatch, I'll move the discussion in a page that will specifically deal with the subject. It does not belong to SemanticBinaryModel.

As you wish but it might clear up the confusion. No-one is claiming an isomorphism is impossible yet your arguments seem to depend on this.

Okay. Here's a quick check. See the chapter: http://hpdrc.cs.fiu.edu/docs/books/datades-book/R-DD.92.chapter10.ps

This is where Rishi levers in OO database coverage. It isn't mentioned before this, and I suspect it is in addition to the original SB model. I think it may be fixable but the presentation is incomplete so it is hard to say.

Specifically, the display() procedure seems to be associated with some global context. I would expect the method to be associated with the objects (person and student-object respectively). This is not indicated and therefore this is not OO, or am I missing something? It wouldn't be hard to fix, but the whole treatment of OO seems tentative. That is not to say that the mapping can't be done, just that there seem to be holes in the explanation. Using Pascal doesn't help either. -- RichardHenderson.

I agree that Rishe's treatment of OO issues is in a tentative stage. However, this does not diminish the value of the SemanticBinaryModel he proposes. As a matter of fact, I believe the model is absolutely complete and useful without addressing the OO dynamic method (function) dispatching issues, as they do not belong to the database domain. Still, there are valid execution models along the lines Rishe is proposing. See ParametricPolymorphism. -- CostinCozianu

About the generality of binary relations vs n-ary relations: see http://www.dbdebunk.com/page/page/1147347.htm (note that this is talking about the binary relational model rather than the SemanticBinaryModel, but the comments about binary and n-ary relations still apply). Also see PrincipleOfLeastPower.

The "semantic" parts of the SemanticBinaryModel are indeed more general than the RelationalModel, and in useful ways. The approach to object-oriented features needs work (and there has been more work on this since 1992). In any case, the book is well worth reading. -- DavidSarahHopwood

OK, I'm browsing through Rishe's online book [http://hpdrc.cs.fiu.edu/library/books/datades-book/]. Near the beginning of Chapter 3 ("From the Semantic to the Relational Model") he talks about "Time Invariant Attributes and Keys" (section 3.1), and says this:

[there are no time-invariant attributes in the real world] ... Thus, time-invariance is defined only in implementational restrictions. Such restrictions are unavoidable in the relational database design. The methodology of relational schema design that is presented below has among its goals the minimization of the negative effect of such implementational restrictions.

I also saw something similar about time-invariant keys in his early paper, referenced at the top of this wiki page. This is somewhat of a misrepresentation of the relational model, isn't it? Perhaps there's an implied but missing discussion about "relational schema restrictions required in order to simulate object identity"? -- DanMuller

The issue here is whether use of (at least some) time-invariant keys is in practice unavoidable in relational database design. Certainly the use of time-invariant arbitrary IDs as keys (exactly as in Example 3.4) seems to be very common. --dh

First, I didn't see a qualification of "in practice" anywhere, but rather a bald statement. He makes somewhere (perhaps in the original paper) the point that real-world "keys" are rarely invariant, then claims that time-invariant keys are necessary [in the RelationalModel].

His argument seems to be overstated and not well-supported in some respects. However, IIUC he would argue that (using the terms from AutoKeysVersusDomainKeys):

time-invariant DomainKey?s are rare in the real world
- Not true.
  - What of the point about needing to correct errors?
time-variant DomainKeys?s cause many problems in enforcing data integrity
- Widely exaggerated both in theory and in practice. But the essential thing is: if you think DomainKey?s cause many problems in enforcing data integrity, AutoKey?s can only make matter worse in this regard.
  - The point is that AutoKey?s aren't needed in a semantic database. The main difference between AutoKey?s and references, apart from references being opaque, is that references cannot dangle.
the need for any AutoKey?s is by definition an "implementation restriction".

The claim that time-variant DomainKey?s cause problems seems to be based on the idea that if we change a DomainKey? of some "abstract object", we generally want existing references to the object to stay pointing to it, and that this is automatically the case in a SemanticBinaryModel database but difficult in a relational one. (We would have to change all instances of the DomainKey? in all tables as a transaction.) --dh

I do not think such a feat is difficult in a relational database. I currently have one and all it takes for me is to declare ON UPDATE CASCADE. The reason why ObjectIdentity makes matter worse with regards to data integrity is the following (thanks to an anonymous poster's contribution in ObjectIdentity page).
- That was me actually :-) --dh
You have the following option : to say object A is related by relation R to "the object that has the identifying property P" that can currently be object B, or you can say that object A is related by relation R to the object B that at the time of this declaration had content (content B) -- but in the future its content may change. The second option involves the considerable risk that through programming bugs, undocumented business rules, etc, object B may point altogether to a different entity in the domain of discourse, therefore the proposition R(A,B) may record an incorrect fact. In relational model you have both options. In SemanticBinaryModel or in many OO data models proposed so far, the second choice is already made for you: and it is the unsafe choice by default.
- No, absolutely not. In the SemanticBinaryModel (let's leave aside other OO models on this page), you also have the choice of both options. The only difference really is that the RelationalModel makes you jump through hoops (introducing AutoKey?s) to simulate a direct reference. --dh
- It seems to me that SemanticBinaryModel as proposed by Rishe has no provision for such.
  - Just add a relation (probably functional, i.e. 1:1 or m:1) between P and B.
- I wasn't able to find any systematic addressing of database integrity constraints http://hpdrc.cs.fiu.edu/library/books/datades-book/.
  - The book is clear on the fact that you can enforce 1:1, 1:m, or m:1 properties of relations. I haven't read the whole book yet, but there's nothing about the model that would preclude enforcing at least the same constraints as in relational databases. (There are additional constraints concerning categories that you might want to enforce.)
  - I need to clarify my statement: the book does discusss integrity constraints at some length. I was very enthusiastic about the book (I actually created this page on wiki), but then I noticed that it doesn't discuss integrity constraints at the same level they were already available in relational model theory (the higher normal forms), and to begin with, it doesn't address the issue of making the expression of integrity constraints more complicated than they need be by arbitrarily restricting the relation degree to 2. Also since integrity constraints are exemplified what seems to me as a full fledged predicate calculus (as opposed to relational model which is more restricted) I do not see how the problem of enforcing these integrity constraints in practice can be solved. I need to get some time to brush up my reading on the subject.
    - The paper claims that "no normalization of a semantic binary schema is needed". It effectively replaces that with other schema quality criteria that are less formal, but less prescriptive and arguably more relevant to good schema design. See in particular point 4 in section 4.1 of the paper.
    - This claim seems trivially false. At least it is unsupported by any argument that I can see. Proof exercise: take a relational schema in BCNF but not in 4NF. Decompose it into binary relations. Add all the integrity constraints needed to ensure that the binary schema can be recomposed into the relational one (that would be a highly non-trivial exercise -- an indication that Rishe did not consider all the practical implications of his proposal). What you have now is: a binary schema with a lot more constraints, because some of the constraints were enforced by the structure of the n-ary relation rather than having to be expressed explicitly. The binary schema from the same update anomalies because it is absolutely equivalent with the relational one. You could easily reach the same schema with the bottom up design methodology starting from scratch rather than from a relational example.
    - That's not how you would approach the problem at all in Rishe's methodology. You're supposed to use the additional expressiveness of the SBM to start with a non-redundant domain model, where as many as possible of the integrity constraints are expressed by the schema rather than separately from it. So your "proof", starting from a relational schema in BCNF, doesn't have anything to do with what Rishe is suggesting. Section 7 of the paper rather overstates the case, but is an interesting perspective :-)
    - There's no extra expressiveness in the SBM related to join dependencies. That's why I chose a schema in BCNF but not in 4NF. It can be argued (although not very formally, mind you) that following Rishe's methodology one can achieve a schema in BCNF. Regardless of methodology, the schema that you achieve following my "method" is a legal SBM schema. It suffers significant problems related to update anomalies.''
    - At no point does Riche claim that all legal SBM schemas avoid update anomalies or other problems. Your argument completely misses the point: you can't develop a schema using one methodology and then use problems in the result to criticise an entirely different methodology.
    - Wait a second, doc. Since I can easily claim that the resulting schema could have resulted from following Rishe's method (indeed there's nothing to prevent me from reaching that schema) then I can at least claim that Rishe's method and SBM (insofar as they are presented in the references above) does not address important things already addressed in the relational theory, which should have been addressed rather than a mathematically dubious claim being made that such problems do not exist if you follow a careful methodology.
    - Now you have to prove that following the design methodology you are carefully guided away from the pitfalls associated with denormalized schemas. I.e. you'd need a constructive mathematical proof with regards to the properties of SBM schemas resulting from Rishe's method. After all, you can't make the problem disappear with a slight of hand: every relational schema has an equivalent binary schema and every binary schema is a relational schema.
    - Every SBM schema is not a relational schema. Don't confuse SBM schemas with binary relational schemas (as much of the rest of this page appears to do :-( ). SBM schemas are strictly more expressive than binary relational schemas; the latter don't have any concept of categories or category constraints.
    - Unless you want to say that binary relations are not relations, I think we can agree that a SBM database engine can be trivially implemented over a relational model engine.
    - It's not trivial, but of course it can be done (and Riche has done so).
    - Just put all the categories in one relation, all the object IDs in another relation, and all the binary relations in yet another relation. Voila, you have your faithful encoding of a binary database into a relational one.
    - No, you don't, because the SBM schema is not enforced in this encoding.
    - Section 7 of the paper is rather misguided in assuming that the theory of normal forms is proposed as a design methodology -- well, the confusion persisted for some time in literature. The theory behind normal forms is independent of the design methodology (though it could be used as a design methodology for lack of anything better), it is primarily a validation tool to ensure that schemas have certain desirable properties.
    - We need some examples for a SemanticBinaryModelRelationalComparison?. Maybe something from http://www.cs.sfu.ca/CC/354/zaiane/material/notes/contents.html ?
    - On n-ary relations, see section 1.2.3 of the book. n-ary base relations are sufficiently uncommon that it doesn't matter that they need to be expressed in terms of binary relations: it's more useful to simplify the model than to support n-ary directly. --dh
    - What about joins? If you join two binary relations, you get a ternary relation last I checked... even if you limit RelationalVariables to binary relations only, you still have to deal with higher arity.
      - Hmm, apparently the book does rely on converting n-ary relations to binary even in that case (see section 4.3). Doesn't seem to be a problem, though (especially since there is syntactic sugar to deal with it).
- As to what regards allowing or disallowing reference semantics, the challenge is up for grab in ObjectIdentityExamples to prove that they are useful in data modeling. Otherwise we can assert that it is a dangerous feature and it is a non-essential feature, therefore it should not be supported.
  - This seems quite an odd position to me. AutoKey?s simulate reference semantics. Are you suggesting that AutoKey?s not be allowed in the RelationalModel? --dh
  - AutoKey?s are optional, they are not imposed on you.
    - If you argue that reference semantics are dangerous and should be disallowed, then it would be consistent to argue that AutoKey?s are dangerous and should be disallowed, since, to labour the point, AutoKey?s simulate reference semantics.
To address further concerns about ON UPDATE CASCADE: yes, in current implementations it incurs a negative performance impact, but on the other hand such an impact is negligable when considered with regard to the bulk of database update activity and the frequency of key changes. More refined DBMS implementation can provide an option that domain keys and foreign keys referencing them should not be stored in place with the rest of the tuple, but rather in a separate B-Tree and have the table tuples point to the lopcation in the B-Tree, therefore letting the physical implementation mechanism in the DBMS perform automagically the same logic available now through use of objectIDs. I do not think such an optimization (if it is an optimization at all) is really the crucial missing feature of major DBMSes.
- ON UPDATE CASCADE is an extension to the RelationalModel. --dh
- Which supports my point, obliquely. There is nothing about the relational model that makes it easy or hard to update keys (natural or otherwise), so there is nothing about the relational model that makes the introduction of time-invariant surrogate keys "unavoidable". The chapter on relational model mapping starts out by embracing the all too common practice of improving upon a caricature of the RM. I'm thus less inclined to read the rest of Rishe's work. -- DanM

Furthermore, time-invariance of surrogate keys is not necessary unless you're storing references to the data outside the database -- which imposes a requirement from outside of the relational model. Even with surrogate keys, it's perfectly possible to update them along with all foreign keys, and in fact it's not uncommon to "compact" automatically generated numeric keys occasionally.

Updating natural keys is similarly straightforward. If the "key" of a real-world object and the key of the datatabase "object" which represents it are both changed, then the database object is still correctly associated with its real-world counterpart. So time-invariance of keys in the relational model is avoidable, unless he's applying some additional, non-relational requirement that I missed.

But what is a "key" in the real world changes over time. For example, crime or spy investigators may look at several different aspects of a person's life to determine if they are the same person or not. Names and SSNs can change, so they may look at where they lived, where they worked, etc. But a computer cannot run those kinds of studies for every matching exercise. An assigned ID simply makes a more UsefulLie much of the time. (AutoKeysVersusDomainKeys)

I'm not sure how this comment is relevant? If your application needs to deal with such issues, then a surrogate key makes perfect sense -- I wasn't arguing against surrogate keys. But that still doesn't address time-invariance. There's no particular reason within the relational model that surrogate keys have to be time-invariant. What is Rishe thinking of that leads him to assert so clearly that adding time-invariance restrictions is unavoidable in implementations of relational databases?

Anybody want to give an EssExpression based example?

Hmmm, this page is pretty thready. Having reread the book plus more recent info, and noting that noone references Rishe's work except Rishe and his best friends, I think we can say that Rishe's work is strangely underconstrained in some areas (why Binary? oh yes, we have 'offerings' to cover higher order relationships), and he overconstrains others to damn them (whats wrong with autokeys in rdB's?). Furthermore he shoehorns in extra things as they appear (OO makes a mess of categories, but he puts them in anyway).

So, I judge the SBM to be pretty weak. Sorry, but if it was truly distinct, and added power, it would be out there leaving Larry in the dust. Or there's a conspiracy. Or we're all very very dumb.

Of course if we remove the binary constraint (possibly a hangover from ER modelling), and express 'semantic' as 'named relation', then we have the named relational model which isn't nearly as zingy.

If I get round to it I'll edit this page down to remove the topics covered better elsewhere. Fell free, anyone, to jump in and correct my obvious dumbitude. --RichardHenderson

It's true that hardly anyone references Rishe's work, but I don't think that's particularly relevant. There doesn't need to be a conspiracy for any model to get ignored. (Look at the ActorsModel, for instance; that's an example of an excellent model that would solve many real-world problems being mostly ignored for 30 years.)

Anyway, there are other semantic database models that are more often referenced, and IMHO Rishe's work is a reasonable introduction to semantic databases in general.

Why do you say being binary makes it underconstrained? That's a constraint -- but not a particularly important one. It's done just as a simplification, AFAICS.

N-ary semantic models don't look much like the named relational model at all. Semantic models have categories, category constraints, and (essentially) ObjectIdentity. If you mean that semantic models can be simulated in the relational model (named or otherwise), so what? That's a TuringTrap; any reasonable data model can simulate any other. -- DavidSarahHopwood

CategoryOnlineBook