Object Relational Psychological Mismatch

Is there a fundamental psychological or philosophical difference between relational thinking and object thinking?

Do you know that the philosophical difference between relational thinking and object thinking comes from years before computers were invented? back then it wasn't a discussion about ObjectRelationalMismatch? but about bundle theory and substance theory... it was the BundleSubstanceMismatch

One of the main components and sources for the ObjectRelationalImpedanceMismatch.

Examples

Let's take the all too famous RationalCompany, which is very fashionable these days with all its contribution to UML, RUP and lots of expensive products. Amongst its many white papers and guidelines there is a certain one called "Integrating Object and Relational Technologies" http://www.rational.com/products/whitepapers/296.jsp. It sure looks interesting. Under the paragraph Relational Database Designing one can read:

The relational model is composed of entities and relations. An entity may be a physical table or a logical projection of several tables also known as a view.

This is absolutely wrong! The EntityRelationshipModel is ConsideredHarmful, and has nothing to do with the relational model. EntityRelationshipDiagrams? can be used as drafts of a conceptual data model, but that's about it.

Actually the RelationalModel is composed of data types that appear as attributes in relations, and this is just one type of DataIntegrityConstraint?. Other type is ReferentialIntegrityConstraints?, and there are yet others.

What RationalCompany calls relation is actually a relationship, and it does not figure at all in the RelationalModel!

Continuing the ignorance show:

A relational model has the following elements:

An entity has columns. Each column is identified by a name and a type. In the example, the LINEITEM table has Description, Id, Number, Order_Id, and Quantity columns.
An entity has records or rows. Each row represents a unique tuple of information which typically represents an object's persistent data.
Each entity has one or more primary keys. The primary keys uniquely identifies each record (e.g. Id is the primary key for LINEITEM table).
Support for relations is vendor specific. The example illustrates the logical model and the relation between the PRODUCT and LINEITEM tables. In the physical model relations are typically implemented using foreign key / primary key references. If one entity relates to another, it will contain columns which are foreign keys. Foreign key columns contain data which can relate specific records in the entity to the related entity.
Relations have multiplicity(also known as cardinality). Common cardinalities are one to one (1:1), one to many (1:m), many to one (m:1), and many to many (m:n). In the example, LINEITEM has a 1:1 relationship with PRODUCT and PRODUCT has a 0:m relationship with LINEITEM.

That's the quintessence of the relational model according to much of the industry... but it is dead wrong. While it is caricatural, many OO books or papers present the relational model along similar lines. Well, whoever wrote this paper (it's not signed) should go back to school and complete his education. Unfortunately, a great deal of many people are likely to read and follow blindly whatever comes from RationalCompany.

-- Sandog

Among many other things, they argue (so conveniently of course) that UML is a superior tool for modeling databases, which is entirely nonsensical, see the paper referenced in ModernDinosaur.

Also from the same school of thought one can find the book Object-Oriented Modeling and Design for Database Applications by Blaha and Premerlani ISBN 0131238299 , which among other things (although not by far as horrible as the white paper cited above) affirms that normal forms are "anachronic" and consequently they are no longer of interest. What they are probably trying to say is that they believe that by following their design methodology and mapping the object model to a database schema along the guidelines in the book, one should achieve directly a superior fully normalized database design, so one shouldn't worry about normalization anymore. Well, in reality it doesn't work like that, and in theory they have to show even the slightest shadow of proof.

Sun Microsystems with its flagship J2EE, is among the chief promoters of the psychological mismatch. Before EJB 2.0 the slowness and the complex problems resulting from EntityBean-driven architectures with relational databases, were proverbial, and the EntityBmpFinders are among the main exponents of the psychological mismatch. Instead of doing operations on sets as the relational model allows us, and it is even a good thing to do, we are loading object/saving object one by one.

Among many other niceties the EJB Architecture Blue Prints argued (it was for EJB 1.1) that "relationships between entity beans should be avoided as much as possible", of course, because they were slow. The nice thing about this recommendation is that relationships between entities do exist in reality, and if we are to follow the well thought guidelines, we should give up on modeling the reality. Of course the right solution is to drop entity beans (at least the 1.1 version Entities ) and not the connection with the reality.

To fix the broken things after 3 years of architectural mishaps, the brand new EJB 2.0 comes to the market. And here we have the fully revamped entities (only the CMP part of course) with the brand new EjbQueryLanguage, see the page for the full details. Who's to say that ObjectRelationalImpedanceMismatchDoesNotExist, when Sun, IBM and all the companies in the committee are clearly stating and promoting the contrary.

It is not that they failed, but it is how they failed. And even if this doesn't prove the absence of O/R impedance mismatch -- on the contrary, it tends to support it -- it is a clear evidence for the existence of the psychological mismatch, which is the purpose of this page.

More examples to follow.

SQL databases are the norm in the enterprise. Why not take advantage of their features instead of bludgeoning them into an OO implementation mind set?

It is exactly this problem, an OO implementation mind set that triggers the psychological msimatch most often. Once you go to a more abstract conceptual perspective the problems are no longer there.

A typical example of "implementation mindset" is when dealing with relationships. Conceptually relations have no directions, [...]

Not so. In mathematics and in the RelationalModel, relations are sets of ordered (or at least labelled) tuples.

[Not so, in the RelationalModel (in mathematics, you're absolutely correct). A tuple is a set of ordered triples <name, type, value>. A relation is a set of tuples. The tuples themselves are not ordered; the individual triples are labeled with name and type, which eliminates the need. See AnIntroductionToDatabaseSystems (there is an isomorphism between mathematical records and ordered tuples, and the choice to backtrack a bit and use mathematical records instead of mathematical tuples for relational (though the name 'tuple' was kept) is visible in DrCodd's own work. A relational tuple is very much not an arbitrary set of ordered triples. The (name,type) is paired and all names are unique. This features are quite necessary to support the relational algebras and calculi. And, neatly enough, that constraint results in a relational tuple that remains completely isomorphic to a mathematical tuple... making squabbles over which relational is using rather trivial.)]

[...] and may have a degree greater than 2 (like ternary relationships which are very common). But when approached with an OO implementation mindset, relationships (in fact confusingly split between aggregations, compositions and associations types), have suddenly directionality because we're talking about pointers , and also because we're talking about pointers and navigation (dereferencing of pointers - see NavigationalDatabase) the n-ary relationships just vanish into thin air. Examples: EJB 2.0 and the EjbQueryLanguage, almost all OO books on object design just fail to take into consideration ternary and n-ary relationships - and even those who have a tiny chapter deal with them inadequately. Dealing with binary relationship should be a OO design 101 assignment, the real problems come when dealing with n-ary relationships, and while binary relationships are the most common, I have yet to see a non-trivial application domain that doesn't have at least one ternary relationship. Nevertheless, the OO implementation mindset sees the relationship issue as a matter of where pointers should be put in classA , in classB, or in both.

{Is a list/bag/set of pointers in an entity class a sign that something is wrong? Is an entity class the proper place for such? Is application code even the proper place for such? Is this a signal to use a database (OO or relational) instead? Database thinking is that index maintenance should be farmed off to something else besides application code. For those who prefer it in code, why is it better there for you?}

To quote from an all too famous ObjectRelationalMapping product manual:

Complex relationships - Relational databases are very good at representing one-to-one and many-to-one unidirectional relationships. However, the more complicated bidirectional and many-to-many relationships, common in Java object models, are much more difficult to represent in relational databases.

Now here you have an implementation mindset in the manifestation of its utter ignorance. Again somebody who might need to retake introductory courses. It's not worth mentioning that the product (not only the manual) has no clue about n-ary relationships, and in real life creates very messy situations because of deficient handling of transactional caching in the middleware and many other goodies.

The difference seems to be philosophical -- which doesn't come out as much when you're looking at various implementations.

At its heart, the relational approach emphasizes a kind of "late binding" of relationships. It is more declarative and implicit. The database engine is tasked with much of the grunt work of actually implementing the specific operations to navigate links. This can be very powerful and flexible.

The object approach tends to involve more explicit "early binding" of relationships. Navigational paths are much more explicit. This can provide very good performance with flexibility provided by mechanisms like inheritance.

In the end, both techniques at their best can model the world quite well (in the static sense, anyway). An interesting academic exercise is to use one paradigm to model the other (e.g. create objects to represent domains, values and relations).

But what I have found is that in practice it is the object users that routinely practice good design practices, like proper abstraction, indirection, cohesion, coupling, etc., and the relational users use only the simplest features of the paradigm. This can be seen in the corresponding tools and their features (e.g. limitations of using views properly in relational databases).

That is why when I get into a debate about objects vs. relational I always bring it to the reality of the tools available, not the theoretical world.

-- JeffMantei

In my experience, the mismatch exists from developers coming from a relational database model to an ObjectOriented one. It seems that a lot of developers want to think about the storage mechanism very early on, without regard to the objects and operations that need to be undertaken to produce a working system.

If your application only uses a database for storage of objects, then your criticism is valid - you should be using an abstract persistence layer, or object database.

In my experience, the mismatch is now increasingly the other way round, reflecting that junior programming staff have now grown up not only with OO as the dominant paradigm, and many only having experience of Java rather than having come to Java from a previous OO language (and therefore some appreciation of strengths/weaknesses).

When I see a Java programmer bringing back 400 rows from the database, then filtering, discarding and ordering in Java, that's a classic example of AllProblemsLookLikeNails (as is the mismatch outlined above).

Developers coming from a relational database model probably also have a healthy suspicion of OO solutions, based on their own experience of what happens when you leave thinking about storage till late on - and then someone decides they want a report on the Top 5 items bought by the top 15% customers.

I disagree that RDBMS are mostly about just "storage". I would probably use the relational paradigm even if it was RAM-only. (Perhaps we need an agreeable definition of "storage" and "persistence". It is a common point of contention.)

In my observation OO and R (relational) fight over a lot of similar territory. I prefer my "noun models" in the database rather than application code. I am personally able to navigate, grok, and manage the "noun descriptions" as R tables better than in code. I find it more "virtual" than code, allowing me to see things how I want for a particular need instead of a "global" class model. OO often tries to do too much with application code, IMO. There are things built into RDBMS and related techniques that can be a great time and code saver.

The RelationalModel is better than OO for data management, because of its foundation in set theory and predicate calculus, giving it a solid theoretical foundation that enable a simplicity and power unknown for other methods; actually it is an application of Okham's razor to data management, pressuposing the minimum of entities -- entities here being not the EntityRelationshipModel type of entity, but as in different kinds of components of the model. R-based noun-modeling maps to my thinking process and sense of "complexity management" better. For more on this, may I recommend:

http://geocities.com/tablizer/whypr.htm

So-called OODBMSes are a throwback to the rejected "network databases" of the 1960's.

Moved from OoVsRelational

A viewpoint that ObjectOrientedProgramming and RelationalDatabase philosophy are somewhat at odds with each other. Proponents of the relational side suggest that large OO software "reinvents" the database in too many ways. Relational also allegedly allows "calculated views" or "calculated joins" where OOP would have to hand-code such.

Further discussion moved to OoLacksMathArgument

Being "pro-relational" does not necessarily mean one is against OOP. The two can get along -- as long as OO is restricted to a programming technique and doesn't try to model data.

It seems there are two fundamental differences of viewpoints that cause most of the ObjectRelationalPsychologicalMismatch:

Relational requires records/nodes to have a "table shape", OO does not (TablesAndObjectsAreTooDifferent). However, DynamicRelational relaxes this requirement to some extent. Still, each "record" belongs to one and only one table (real or virtual).
Relational uses relational operaters to manage collections, but OO wants each entity to handle/define its own collection operations (encapsulation), potentially going outside of or not supplying relational operations to entities/classes.
Databases assume data will be readily shared by many applications and tools, while OO accepts the idea of a specific application "owning" state or data. This is tied to the OO concepts of encapsulation and/or responsibility assignment. (SharingDataIsImportant)

This is real. To access any data in an OODBMS, one need to do procedural coding instead of relational declarative statements.

I just finished reading WilliamKent's DataAndReality. It's a surprisingly good & quick read. It's a practioner's book that tries hard to stay away from theory to keep it readable for most software engineers, but also talks of the relevant concepts of data modelling, irrespective of the conceptual model. (On the theory side, I'm still slogging through BernhardThalheim's book on higher-order entity relational models. I get distracted too easily reading it, hopefully I will find the time to focus on it seriously some day soon.)

Anyway, have a look at PseudoBinaryRelationships, it discusses something in this book that made me think about this ObjectRelationalPsychologicalMismatch. --StuCharlton

I think one way of thinking about the problem is to consider that object interfaces are built to define a application-level specialized interface to their programmed behavior, whereas database products are meant to define a generic interface to a low-level storage engine. Although their concerns are related, the difference is close to a frontend designer's concern for well-placed buttons and options versus the backend programmer's concern for well-factored code. They obviously both want to see something working and useful, but operate at different levels and have different ideas about what is ephemeral. The database designer operates at a different level from the two.

The OOP-Vs-Relational fight has been going on since at least the early 90's:

http://www.cbronline.com/news/are_the_relational_versus_object_database_wars_no_more_than_a_staged_bar_brawl

CategoryAntiPattern