Object Relational Impedance Mismatch Does Not Exist

See ObjectRelationalImpedanceMismatch.

I asserted the proposition in the name of the pattern is true, and I intend to prove it. However it would be foolish to do so until we can share a common understanding of what ObjectRelationalImpedanceMismatch is supposed to be. Please help me to do that. -- CostinCozianu

: "When you notice that you're experiencing considerable pain, and you recognize that a substantial amount of the pain occurs in and around the interface between your code and the relational database, then you've found the ObjectRelationalImpedanceMismatch."

What someone just posted as a definition of ObjectRelationalImpedanceMismatch is anecdotical and metaphoric description of the psychological impedance mismatch. OO people usually think they are smart, usually they don't read the books on database theory, usually they disconsider the DBAs, and usually refuse to acknowledge non-OO ideas and principles. And consequently they get into trouble.

ObjectRelationalImpedanceMismatch does not exist in the first place. -- CostinCozianu

I don't think that the object relational impedence is as large as many people think. It's really "inheritance/interfacte impedence mismatch", since any single object or collection of objects can be almost trivially mapped into a table. The impedence shows up because the OO programmers want to sacrifice nothing in the object model in order to fit the data into the database - however - if you are working with an RDBMS, then you are stuck with that data having a permanent home with a well defined structure. If you think of objects as a way of attaching behaviour to data - and let that data live primarily in the database, then the relational and object models have very little impedence mismatch. Use implementation and interface inheritance for solving other problems in your domain. -- JeffBay

By the way, I changed all references from Jeff to JeffPanici since another Jeff joined the discussion. -- JeffBay

If the mismatch didn't exist then we wouldn't need transformers to convert between the two 'sides'. We cannot simply store objects in an rdb untransformed, and vice versa. Unless I misunderstand of course. -- RichardHenderson

Since it is unlikely we'll ever share a sufficiently common understanding I guess you are safe from having to prove it.

What do you think the other author understands? Do you think they are basically intelligent and educated enough to understand what you understand? If so, why not explain it to them so that conversation doesn't have to drop dead at this point with "I'm smarter than you"? If not, on what evidence do you base your assertion that the other participants are either too stupid to grasp the concepts, or so poorly educated that you could not hope to bring them up to speed on your understanding?

It's a naturally squishy topic because it relates numerous abstract ideas, any one of which there are legitimately many different opinions. It has nothing really to do with someone being smarter. To require a common understanding before a solution is presented is by definition impossible. Besides, the author says he has a solution so he must already have a detailed "understanding."

Considering nobody can ever seem to define an object this may be hard, but I'll kick it off with this:

RDBMS do not support inheritance relationships.
RDBMS do not support user defined classes/data types as first class citizens
RDBMS do not support object versioning (50/50 about including this one).

-- PeterForeman

ThankYou. I am impressed that I'm already rebutted before I even try to state my arguments.

If nobody has any strong objection I'll refactor the input here along the lines I described in ObjectRelationalImpedanceMismatch. As a matter of fact, most these contributions belong in there. -- CostinCozianu

The more I read the comments on this matter, the more I believe the problem is an understanding of the term support not the term impedance. I believe impedance is used by some here to indicate that an isomorphic mapping cannot exist. Clearly it can. This isn't the issue. The issue is how well the RelationalModel supports my efforts as application programmer to support the lie that I have an infinite amount of transactional memory. RelationalDatabases make this ruse very hard to believe. To various degrees, ObjectOrientedDatabases make it easier. -- MarkAddleman

Well, if you think the function of a database should be to support your illusion that you have an infinite amount of memory, then you already suffer from psychological mismatch. Buy yourself a book on databases, and try to understand what databases are for, why the relational model has such a strong mathematical foundation, and what is the purpose of that foundation. And then you'll see that relational databases do support you a lot better in writing applications that access Large Shared Databanks, then any software would be able to do it by keeping you in a confortable lie. If you're not willing to do that effort, you can happily use a PersitenceEngine?, and you might even get lucky that it will be able to handle the requirements of your application. -- CostinCozianu

Finally, we're to the heart of the misunderstanding! Praise Be! None of us including RandyStafford or KyleBrown have EVER, repeat EVER asserted that you can't do an isomorphic mapping from a relational model to an OO model. In fact, as CrossingChasms documents, there IS ALWAYS a simple isomorphic mapping from any arbitrary OO model to a 3NF relational model.

That's NOT what we mean by ObjectRelationalImpedanceMismatch. The problem we're trying to get across is it requires a LOT of code and a LOT of work to store objects created in an OO language in a relational database, and likewise to take rows out of a relational database and represent them as objects in an OO language. -- KyleBrown

I would be quite scared if the relational model in general isomorphic with whatever OO model (would you care to point exactly what OO model? what book, what OO language , what theory). Because otherwise we might get into issues like RelationalHasNoObjectIdentity and it would be quite scary to find out that the relational model can represent information based on nothing else than object identity - the thing you complained about in CrossingChasms.

The psychological mismatch here is again that databases are not supposed to store objects created in an OO language, at least on two grounds: DatabaseApplicationIndependence, and DatabaseIsRepresenterOfFacts. For storing objects you can easily make use of persitence engines. Once you dumb a database down to store this object, give me the other one (like the famous EntityBean example) you really get into a big mismatch, but I'm affraid you can't blame it on the relational model. The solution is to depart from building applications as object centric even if we build them in an OO language, and instead use an information centric approach, focus on how we represent information and how we transform, update, summarize, report that information, and OO languages are quite capable in helping us in this case. -- CostinCozianu

Bingo! The problem is not one of philosophy. It is about practical implementation facts. -- PeterForeman

Actually, having read RandyStafford's TaxPayer? example, I may amend my statement to be relational models are a proper subset of object model models and therefore, the term support simply means that object models have to be dumbed down to support relational databases ;-> -- MarkAddleman

This isn't true. The relational model is more abstract and complete than current OO models, with respect to data operations. As such it has more power than pure OO approaches. Less convenient maybe. Less efficient when a pure pointer based approach is desired, but this approach is sub-relational. -- RichardHenderson

Can you provide an example of a RelationalModel that cannot be modeled by objects?

Mark, it is true that any relational model can be modeled by objects, and by that I suspect you primarily mean pointers - you model relations as aggregations, compositions, associations, using vectors, hashmaps, trees in a more advanced mode, and so on - like in almost all OO books on software engineering that I've seen, which ultimately means you use pointers or object references. The problem is that you model the relational model at a much lower level. Relational databases also do exactly what you do, use pointers, B-trees, hasmaps and so on, to implement relations. The thing is that all these are internals, and are very flexible, because they offer you a higher abstraction in the form of a logical model. This also allows them a great deal of flexibility in how to choose the best implementation of a logical model, how to optimize a query and so on. -- CostinCozianu

Frankly, Costin, I've seen this pointers argument about a million times between folks trying to say relational models are more pure or abstract than OO models and this is complete and total hogwash. In Smalltalk and Java, I never, ever see a single pointer. I don't even know what a pointer is in these languages. Just becasue the relationalships are implied between objects via references doesn't mean that it's more low-level and that relational systems are more high-level. In fact, I would posit that the opposite is true. The higest level of abstraction in any commercial relational system to date is a table, which can have a name. That's it. But I can't put any table that I define in another tuple as a high-level reference. I have to use a key, and map it to the other table using a foreign-key. Sorry, but this is already twice the work I have to do in any OOPL. More to the point, when you talk about query optimization and the sort - which almost all commercial OODBMS products provide via OQL, there is a distinct difference here. First, the relationships are already encoded in the object-oriented systems. Say I have an Order, and it has (aggregates) OrderLineItems?. That relationship is encoded in the object schema. This is already an improvement over what relational set theory offers, as the best one can do here is to join the two sets together. This is altogether inefficient and worthless in practice. Further, if - by chance - I would want to say have some other object aggregate references to OrderLineItems?, then it is simply a matter of refactoring my code to do so and a mere recompile away. Remember, all of the "Big Boy" OODBMS products like Versant, POET, Objectivity/DB, ObjectStore support automatic schema evolution. What you stated above is known as a RedHerring; it's a semantical distinction at best. -- JeffPanici

In Smalltalk and Java I never, ever' see a single pointer.. JeffPanici, get your facts right please, before you jump to conclusions. What the heck do you think NullPointerException is for? Java has no pointers only in Sun's marketing papers. -- Costin

I do have my facts straight, Costin. Amongst many of their various other foibles, Sun chose to name the "NullReferenceException?" a NullPointerException. It is a null pointer, to be sure, in the context of the VM. But, we can fogive Sun for their naming choices and realize that this doesn't really mean I can create a memory pointer and assign any atrbitrary memory address to it, in Java, because I cannot do such a thing. In languages like Java and Smalltalk I only have references. These are patently different from pointers. Sure, they can achieve the same goals, but not in the same way. -- JeffPanici

Just because you want to call them references doesn't make them less of a pointer. The fact that you cannot assign arbitrary addresses, means you have typed pointers, they can only be assigned the address of an object as validated at compile time, since the memory management is automatic ALMOST ALL valid object address that validates as a typed pointer will probably be a valid address, UNLESS you assign them null. Null is not a valid address JeffPanici, and is one more evidence that references are really pointers, and they were right to call it NullPointerException. In C++ you can't assign arbitrary addresses to typed pointers either, unless you force them through a cast. You might be more happy with Smalltalk and Java that they don't allow you to perform pointer arithmetic, but that's another deal. -- Costin

Just becasue the relationalships are implied between objects via references doesn't mean that it's more low-level and that relational systems are more high-level. -- JeffPanici

Yes, they are lower level.They are exactly at implementation level. A relatuion can be materialized, through vectors of pointers, sets of pointers (like java.util.TreeSet), indirect through a value that is looked up in a transaactional cache, and I can come up with ten more choices. That is an object model is lower level than a logical data model because you already have chosen how you implement the higher level concept, which by the way is relation, because that is what is called in algebra. -- Costin

from CostinCozianu on ObjectsAndDataAreSeparate: Well, if you care only about performance in development and don't care that much about transactional throughput maybe GemstoneJ is the right tool for you. Especially since you're confortable with it and unconfortable with Oracle. However this doesn't mean you can assert an impedance mismatch, rather a psychological mismatch

Ah ha! Now we're at the crux of it all! All along, we've been saying the same thing without realizing it. The DatabaseImpedanceMismatch does not exist as a RealThing. It cannot because I can map any relational model to an object model and vice versa. Clearly, the two are equivalent in power. Any difference must be one of implementation and I find that it is almost always more convenient to code an application using an ObjectDatabase rather than a RelationalDatabase.

What if later you or your current company (assuming that you program for a customer) won't be there? Somebody else may want to come and extend the system, or just fix a damn thing.

Generally, if they want to come in and fix some damn thing, it's more likely that the damn thing will be a logic bug (application code) rather than a schema problem. However, if they need to fix a schema related problem, the quantity of knowledge that they will have to have under an ObjectOrientedDatabase is the same as a RelationalDatabase. That is, they will need to understand how the schema is laid out and how to modify it. I'm not going to get into an argument about which one makes schema evolution easier. I don't have enough experience with either other than to know to avoid ObjectOrientedDatabases that don't provide explicitly for schema evolution.

And have no option BUT to use Java or Smalltalk. What if in next 3 years Scheme will be the language du jour? Relational Databases are not tied to any language, or any client. What if I want to use Excel or a more complex statistical package to analyze some data that you stored as well encapsulated Java objects? YouArentGonnaNeedIt applies here, I think.

If you're point is that more people know SQL than know GemStonej, then I'm not going to argue that either. But, the logical extension of that argument is to code everything in CiCs, Db2 and CobolLanguage.

No, my point was not about how many people know SQL, but I'll come back later on it

Folks, I have to say this: It's seems to me that the only thing that keeps getting said here is that the "relational" model supports what I'll call dynamic binding of sets. Costin (and others, I might add) are trying to propose that the loose coupling of relational sets, and the dynamic joining of these sets at usage-time is somehow better or cleaner than what OODBMS products do, which is to statically bind the relationships together based on the DomainModel being compiled. (Read back through almost all negative comments on OODBMS' and you will see this. They think that somehow we're developing to some low-level physical data model, which simply isn't true at all. ALL - and I have a lot of experience here - commercial OODBMS products support automatic schema evolution. One never, ever has to [? blank ?] the physical implementation of the persistent storage.)

Once one realizes that ObjectOriented systems define relational sets just like relational databases define relational sets, the battle is over. Did you get that Costin? They're the exact same! The only difference between the two is that one implies a "late" binding and the other an "early" binding. What is the difference? Performance. That is the only difference. Now, performance has many categories too, I might add. There's development performance: RDBMS products don't support refactoring all that well. This might change as time moves along. Today they do not. There's runtime performance: RDBMS products certainly are much slower than OODBMS products because one has to do multiple dynamic set manipulations to bring objects back to life.

Also, as a final sticking point, current RDBMS products do not support the notion of generialization or inheritance. This certainly adds to the complexity of implementing object-oriented systems with relational databases. I've written an article on this topic, which I've been heavily involved in as of late. Go to http://www.workerbee.com/articles/WhyOODBMS and let me know what you think. -- JeffPanici

JeffPanici, I'm glad that for you things are so simple. That must be a great relief for you.Now, aside from marketing materials on what exactly do you base your affirmation RDBMS products certainly are much slower than OODBMS products ? Common, maybe you can get some facts in here. -- Costin

Just because you made me curious would you care to answer to the thing below:

What's your favourite OODBMS
What is its object model, can you provide an URL to it?
What type of declarative constraints (the kind that work even if you have a bug in your application)?
Can you approximate the time it takes to upload 5GB worth of data from an application pipeline ? Can I write that application in C, for obvious reasons ?
How do you create an index, and what indexing schmes does it support ?
What isolation levels does it support, and what concurrency policy ?
How do you decide when an object model is of a good quality?
Last but not least, can I see your favorite database at www.tpc.org ? Since they are certain to outperform RDBMSes I figure they must have a strong economic incentive to prove it.

JeffPanici, I amused myself because on your site you already state that you're sure that your paper is going to make a flame war. Guess what ? I'm in no mood for flame wars, and even if we start one, the wiki community is free and willing to refactor by deletion. Why I put you the questions above is because I just wanted to know. From a theoretical perspective, a proper relational database should be objectual , and a proper objectual database should be relational, the two have to necessarily converge. Unfortunately, given the limitations of today's products we really have to choose the lesser evil.

But I am willing to learn, JeffPanici, I am utterly aware of my limitations, and know that as long as a man shall live he shall have to learn. Maybe there's some misterious product that I don't know about who really is the phylosophical stone we are looking for in database implementations. Maybe there's a new unifying theory out there, in which case I'd be grateful if you point it out to me. Please be so kind and point it out. And if that product does not give free access to its technical documentation (in which case I can also go figure out for myself), please be so kind and respond to the above questions, which are some of the traditional weak points of OODB. -- CostinCozianu

I think this all depends on how one defines object orientation. I define objects as DictionaryDataStructures of a sort. Under this view, TablesAndObjectsAreTooDifferent, and thus there is an impedance mismatch. And, I won't force anybody to read a book to understand my perspective :-)

If only database management systems really implement the full relational model-then OO and relational databases will be a lot closer. I just wonder when rdbms vendors would implement domains. If that happens, we will see if we will still be debating about object relational impedance mistmatch. But until then - mapping objects to the rdbms will still require some work. It's good that OR mapping frameworks are available.

Or what if one could put executable code (other than stored procs) in a database that with a command - like 'edit employee with this number' a form would show up on the client side showing all the related information for that employee e.g. education, training, job history and the like. Then with 'commit' command the whole object will be updated. I hope this is not off-topic -- x44

I think some light on the subject can come from noodling on the following: Why did no one agonize about the impedance mismatch between old-C and relational databases? What did C++ bring to the party that both made it OO and provoked the impedance mismatch discussion?

I'm going to propose that the relevant difference is that with OO, the data model comes along with the programming model and the most comfortable, most effective OO models are decidedly un-relational. I'll also propose that this creates not a mismatch but merely more work and some flexibility on the part of the DBAs.

The essential difference is that relational DB primitives can only model one-to-one and many-to-one associations - anything else takes cross-reference tables. On the other hand, OO aggregations, composites, and collection objects represent one-to-many associations. If you want to know about Alice's marbles, the relational approach is to ask every marble in the domain whether it belongs to Alice. The OO approach is to ask Alice for a list of her marbles.

One of the ways to produce a sub-optimal OO design is to worry about persistence in an RDB. J2EE entity beans are an excellent example of pathological design that comes from forcing the object design into a relational mold. It is normal for people obsessed by the purported mismatch to solve the problem by punting in this fashion. Marbles.findByPersonId() is an obscenity.

But what if you have a collection of marbles and need to group them by owner? With the "OO" approach you describe that would mean iterating over all the owners, asking them for their marbles and noting which ones are part of the collection - with horrible performance (or asking every potential owner whether they own each of the marbles we are interested in, with even worse performance). The relation approach is SELECT * FROM marbles WHERE marble_id IN (...) GROUP BY owner - which is far simpler to code, already better performance-wise, and can be tuned independently of the code with judicious use of indexes etc. Talk about not getting the message...If there's a use case that requires finding owners given marbles, then you provide each marble with an owner collection (or property, if you are prepared to rewrite later when it turns out that marble joint-ownership is possible). -- mt

The normal OO way to manage a bunch of objects is to put them into some kind of collection. The collections themselves are first class objects (not singletons). If persistence is required, the most obvious (and least used, apparently) approach would be to make the collection persistent by whatever means are available. That includes using an RDB to maintain lists, or just writing the whole thing out to a file. I've done both.

The harder problem comes with encapsulation and applying component philosophy to the design, expecting the database to be part of the package. This takes a little explanation, so bear with me.

Remember Alice's marbles? The standard relational way to handle this is to put an Alice tag on each of her marbles; the marble table includes a person foreign key.

In the object design, there is a Person class, a Marble class and also a Marbles collection class. Each instance of Person includes an instance of Marbles containing those instances of Marble the person owns.

Using a component engineering approach, we expect to build and debug each of these classes and not have to change them ever again. If there is a new requirement that we track the marbles in store inventories, we just add a Marbles collection to the properties of the Store class and it's done; the Marble and Marbles classes are not changed by having a new user (Black-Box Engineering 101).

Unfortunately, all is not so neat in the RDB (or J2EE). The RDB marble table has to have a different foreign key field for each type of thing that can collect marbles. So each time a new use is found for marbles, the marble table has to be modified, (in J2EE, a findByStoreId() method has to be added to Marbles) and everything that uses it has to be retested to make sure the change hasn't broken some other use.

The problem isn't intractable - just don't put any foreign keys in the marble table (and don't use J2EE persistence) - but I've never seen a DBA solve it without feeling that he's broken some kind of rule.

Just put the "owner-marble" relationship in its own table, and any new such relationship (e.g. manufacturer) in their own (new) tables. The marble table doesn't need to change at all, and no rules are broken. In fact, a couple of interesting things about this pattern: You get the teddy-bear!

1 Often you would put the new relationship in a new table in any case, because the relationship is 0/1-N. While this could be implemented as a nullable FK, a separate table is cleaner - it keeps independent propositions in separate places. I don't know about your world, but in my world DBAs would rather give up sex than fill a db with two-column tables! Perhaps their sex lives are insufficiently enjoyable? In what sense does the database "fill up" more by adding a two-column table rather than adding a column to an existing table? I don't know but I'll guess that maintenance effort is more a function of the number of tables than their size?

2 In a true RDBMS (not a SQL DBMS) you could in fact have a relation-valued "my_marbles" attribute in marble_owner, and in marble_manufacturer (just like the proposed "OO" model). However, even where RVAs are supported, this is probably not good design, because it makes queries like "all marbles grouped by colour then owner" harder to express. I find a lot of poor decisions being made on the basis of imagined ad-hoc queries (ignoring YAGNI). My approach is that if it's in a use case, design it; if not: who cares? I tend to think of ad-hoc queries as evidence of poor requirements gathering. But YAGNI doesn't say to make things unnecessarily _harder_, does it. What are the disadvantages of the RDM "separate linking table" approach, from a logical expressiveness p-o-v? (Remember that you can always define a view/virtual relvar to give the "FK in the marble" version.)

In any case, the OpenClosedPrinciple works differently with relations than with objects - adding a column doesn't really "open" the table from a change p-o-v and is closer to "inheriting from" than "changing". Good point.

But hang on; it gets more interesting. Object collections aren't restricted to collecting things of the same type. The restraint can be as weak as implementing a one-method interface (see Publish-Subscribe). The most interesting example of this is your advanced case management system, with a variety of object types, any of which will accept attachments which are in turn of many different types. The relationship is not only many-many in number but many-many in types. The typical straight-laced, anti-component solution has tables with lots of sparsely-populated key fields.

One of the simplest and relatively common applications involves implementations of the Composite pattern common in graphic packages and file systems. This design has instances of item, each of which can be atomic item or a list of item (e.g. File or Directory). When you enumerate a list, the thing you get() could be either one. The visual is a ragged tree with an atomic item at the end of each branch. The object design is

  Item <|-- Atomic
  Item <|-- Items
  Items <>-- Item

If you are lucky, Atomic represents only one type; in a graphic package it doesn't. The typical DBA solution to this is quite ugly. The elegant solution takes a minor holiday from referential integrity by implementing lists of lists in tables of gluons treated as data rather than keys. Referential integrity is a risk-management tool; nowhere is it written that a DB design has to reduce risks to zero.

In any case, making collections persistent in a relational database is not all that challenging, but I don't believe a really good OO design can be coupled to a really good relational design by any kind of automation or cook-book approach. There are too many trade-offs and opportunities for creativity. The effective approach is not to map classes to tables but to find the best RDB design for managing each collection in an unrestrained OO design. On the way, try to do some black-box engineering in the DB as well as in the code. -- MarcThibault

From HiddenDatabaseSyndrome:

The real impedance mismatch isn't OO vs relational, it's the DistributedSharedStateConcurrency problem - it's the model of "all persistent data goes in OnePlace?? (the database)" vs the model of "state distributed across the enterprise". Many procedural programmers buy into the database model - applications get data from the database and display it, or they modify it and do a commit - but data held by the application is only temporary. Many OO programmers are uncomfortable with that model - if they have an Employee object in memory on some machine somewhere, they want to manipulate it directly as if that particular object (and not an equivalent database record) is the canonical representation of the employee; the RDBMS tends to get viewed as backing-store. That seems to be the key issue in many of these discussions.

I think that the RelationalOoImpedanceMismatchIsCausedByClasses. On a language with dynamicly generated classes, the mismatch can be jumped over with a simple library (one very big step in this direction is Rails' ActiveRecord). -- AurelianoCalvo

A RelationalDatabase is all about having sets of tuples, joining them, then selecting tuples by means of boolean expressions, then projecting onto some members of the tuples. The resulting rich algebra has useful properties that result in a expressive, high-level query language, but requires the underlying data to be exposed. OOP is all about not exposing data, but only functions. Once you add arbitrary functions to the mix, all algebraic properties vanish.

Exposing to what? Any class has access to every other class. See DatabaseNotMoreGlobalThanClasses.

         :       To the query or programming language.  Access to other classes doesn't let you write "where" clauses, you need access to the fields of other objects.  Methods are no replacement, as they don't obey any algebraic laws.

I can't see anything between RDBMs and OOP that matches at all. They are separate, they stay separate, and the famous impedance mismatch is the serialization layer.

The second OO starts to deal with collections of any sort, it usually starts to fight over territory with RDBMS. Related: AreOoAndRelationalOrthogonalDiscussion