Relational Oo Impedance Mismatch Is Caused By Classes

OO systems are good at managing transient state; relational systems are good at managing persistence, especially persistent facts and deriving new facts from these. There is no impedance mismatch, except OO environments that support dynamic class and/or object definitions may provide a convenient syntax for handling data retrieved from relational systems external to the OO environment, whereas this may be awkward in OO environments that only support static class definitions. -- DaveVoorhis


I've started the page and really liked the conclusion, that's why I've elevated it to the top. Thank you all for your effort. I've kept the entire discussion below (including the old start). -- AurelianoCalvo


Both tables and classes can be thought to be "containers" of objects if we think that ObjectsAreDictionaries. The problem is that they don't contain the same objects. That's why they mismatch. This mismatch is specially hard on strongly typed language (contemplate both Rails' ActiveRecord and ObjectGraph? (AKA: og) in RubyLanguage).

IMO, Rows from the database (resulting from a query) are classless objects (or at least, they have a dynamic class with accessors for each column). Rails' ActiveRecord (for instance) use the RubyLanguage dynamism to generate the classes for the tables on the fly, avoiding the extra and useless work of retyping the class definition for the tables. .Net implementation of the ActiveRecord pattern (Castle ActiveRecord) can't do that and the code is significantly more verbose in C# than in RubyLanguage because of this.

Well, it's just a thought. What do you think? (AurelianoCalvo)

I think it's not quite accurate, though the essential idea is valid.

Classes should not be thought of "containers" of objects. They are (typically) only syntactic mechanisms for generating objects. They do not contain objects. Container classes permit generation of container objects, but it is the generated objects and not their classes that contain other objects. I am not aware of any language that allows you to iterate all objects that belong to, or were generated by, a given class.

The essence of this category of impedance mismatch, as you have essentially stated above, is that certain languages only support static, compile-time class definitions. In such languages (e.g., C#, C++, Java), there is no mechanism to conveniently generate a new class at run-time, hence there is no mechanism to conveniently produce a class from a dynamic query's result set at run-time, and thus there is no way to generate (at run-time) row objects that conveniently use object syntax to reference member elements. Languages that permit run-time, dynamic class and/or object definitions do not suffer from this problem.

Furthermore, it is probably not accurate to say that this is especially hard on strongly typed languages, because it is possible to have a strongly typed language that permits run-time, dynamic class definitions. It just so happens that certain popular languages which happen to be strongly typed also only support static, compile-time-only class definitions.

In short, this type of OR impedance mismatch is only caused by static class definitions, and resolved by dynamic class and/or object definitions. -- DaveVoorhis

So, in the end you agree that the ObjectRelationalImpedanceMismatch is mainly caused by the static class definitions? Is there any ObjectRelationalMapping that generates classes for the tables at compile time for Java, C#, C++, etc. (using the data base schema as a source)? That would be a nice advance over Hibernate, EJB, etc. Just emulating what rails does at runtime. -- AurelianoCalvo

I agree that this type of OR impedance mismatch, i.e., awkwardness in conveniently manipulating result sets, is mainly caused by static-only class definitions. There are other potential impedance mismatches that this does not solve: Some (many?) of these are not an OR impedance mismatch per se -- they're an application language (whether OO or not) vs. database language, or even application environment vs. database environment, impedance mismatch.

My first bullet point, above, deserves some expansion. I believe the notion of constructing application-side classes, e.g., Customer, Employee, Invoice, Payment, InventoryItem etc. to represent real-world entities, which then use the database as a persistence engine, is a naive and flawed approach. The application side should implement mechanisms to facilitate presentation of the database -- perhaps starting with base classes like Form, Report, Menu, Query, DatabaseConnection etc. -- and let the database management system deal with entities, or, more accurately, facts. This approach treats the database management system as an equal partner in implementing application functionality, rather than deprecating it to a mere storage device for client-side objects, and it reduces degree of impedance mismatch because you're generally not having to maintain client-side classes in sync with server-side tables.

However, I have used in-house tools to generate Java and C++ classes from database metadata. Having not used any other ObjectRelationalMapping tools, I can't say whether they do or do not support such a thing. It seems obvious, so I would assume they do, and some of the descriptions on the ObjectRelationalMapping page suggest that they do. -- DaveVoorhis

Having flexible/dynamic objects/classes may allow your objects to dynamically mirror table schemas, but that does not really solve the root of the problem: OO and relational are governed by different rules and philosophies. Sure, you can better fit relational by making objects mirror or imitate relational, but then it is not OO, but emulation of another paradigm. It is not so much a technical problem as a philosophical one. Perhaps you could argue that static classes add a technical barrier on top of a philosophical barrier. -- top

Well, if you want you can model tables as objects. What I see is that the tools currently in use to match rows in the database with objects force the developers to repeat information if they want to use objects to encapsulate them. The Rails implementation of the ActiveRecord pattern is a step to not force that. I believe that ObjectRelationalMappingCostsTimeAndMoney and I would like to spend as little as possible solving this accidental complexity. Because data bases are great tools for some things and it's possible to model them in objects (just model them, it's not necesary to reimplement them), I'm just looking for the better way to use them when they are appropiate. I really don't see a philosophical barrier between objects and the relational model. What I see is that you erect a psicological barrier between them for yourself. I'm really happy to use both tecniques in my professional life every day. My practical concern is that I'm not able to apply the DryPrinciple? (Don't repeat yourself ;) ). If the classes representing the data base were autogenerated, I would not need to rewrite the data base schema as classes. If the schema were autogenerated, I would not need to rewrite the classes as the data base schema. The first approach is the one in Rails' ActiveRecord and the second one is the approach in ObjectGraph? (og). No philosophical barrier, no fuss, no problems. Just a little metaprogramming. -- AurelianoCalvo.

I'd to suggest that there is a simpler mismatch that underlies all this: a tradeoff between row-major and column-major modeling. When we pull back to look more at the forests and less at the trees, I think that "object oriented" modeling begins with a fundamentally row-major viewpoint: when I add an element to a schematic in a CAD system, I mostly care about collecting the various things in that element, and my drawing is mostly a container of those elements. Conversely, when I seek those elements whose price changed by more than a certain amount in the last year, I care mostly about a specific property across all the objects. The first is row-major, the second is column-major. It seems to me that we can then make several observations:

-- TomStambaugh

I agree with this. It is in part why I am interested in languages like TutorialDee that specifically seek to address these issues. Intuitively, I think there is a fundamental region of conceptual commonality or reconciliation (or rapproachement, to use C J Date's preferred term) between object oriented, relational, functional, etc., systems that we have not yet discovered. The same holds for the divide between application functionality and environments vs. database functionality and environments. Alternatively, perhaps the paradigms are not reconcilable, but no single paradigm has been evolved to the extent that it can definitively trump the others. The fact that we can successfully build working systems using any or all of these types of machinery, yet simultaneously complain about impedance mismatches where they meet, suggests that either of these may be true. -- DaveVoorhis

A more fundamental distinction between the concepts of "table" and "class" is that a class provides both behavior and state, whereas a table provides only state. The relational model admits no concept of behavior, whereas an object is fundamentally defined as a package of behavior and state. The inclusion of behavior is problematic within the worlds of CeeSharp, CeePlusPlus, and JavaLanguage as well as within the relational world. Hence the mistaken assertion that ObjectsAreDictionaries. Yes, an instance of an object can be thought as a dictionary, but only so long as that dictionary also contains a mapping to the methods that the object supports. -- TomStambaugh (moved form the top, because it didn't follow the page flow).

Well, tables have behaviour. But its behaviour is more "hardcoded". One can add records, delete records, update records and query them. The queries are special beasts because they can join multiple tables, but they have the same essence. I see clear behaviour there. My bet at modelling relational in a object oriented way is to provide this primitives and, if necesary, add new one. But repeating this for every class is a pain in the ass. That's why I say that RelationalOoImpedanceMismatchIsCausedByClasses. I mean that if one has to write all the code in the class, and can't generate it at runtime, there is a problem because usually the data base is the place were the data is based. If I need more behaviour from the tables than what they provide, I would do the usual tricks: inheritance, façade, adapter and decorator patterns come to my mind. Think of the data base as a framework (and/or library) you use but can't change, and everything works like the rest of the frameworks. Having to retype the framework is a pain, using it and extending it when needed don't seems such a big deal. -- AurelianoCalvo.

Go one step beyond thinking of the database as a framework. Think of the database as an external device and the "impedance mismatch" disappears entirely. We don't worry about the printer-OO impedance mismatch because printers are missing some OO feature. -- EricHodges

Building on the above comment from EricHodges, I view the database as the second tier of a three-tier storage model: primary, secondary, and tertiary. Primary storage is dynamic, loads and stores are (relatively) fast, and volume is (somewhat) limited. Think "memory". Secondary storage has longer latency times but high bandwidth once the transfer begins, and volume is nearly unlimited. Think file system and database. Tertiary storage has very long latency times and sometimes very low bandwidth, but essentially unlimited volume. Think magnetic tapes, DVD, and similar bulk storage. ALL behavior is provided by a processor running methods, including the internals of a database implementation. The purpose of a database is persistence. Period. Persistence of everything -- including methods, metastructure, processes, stack frames -- EVERYTHING. Certain services are often best provided by a database running SQL. I fully support using a database running SQL as a provider of those services. On the other hand, when I have to provide behavior -- even simple behavior, like computing a current balance -- then the database is very nearly the worst place to attempt to provide that behavior, because it is so highly specialized and optimized for its primary function (which it should be). Now, having said all that, it is certainly true that there are shifts in abstraction level, and one could imagine all sorts of innovative architectures where the virtual computing engine sits in some sort of abstract hyperspace with some of the computing engine's behavior provided by stored procedures. That's all well and good, but that virtual computing engine is a virtual computing engine, NOT a database. A database is for providing secondary storage. Period. Persistence. -- TomStambaugh

I believe it is a narrow view to treat a (relational) database purely as a persistence engine. Equally limited would be, say, using an OO language but never taking advantage of inheritance and polymorphism. If you're using a SQL DBMS purely as a persistence engine - and by that I assume you implicitly mean you are not using it as a fact processor, i.e., a tool for deriving facts (e.g., summaries, joins, calculated columns, views, restrictions, etc.) from other facts (tables) - then why not use a persistence engine such as Prevayler or the Berkeley DB? Either of these would probably be more suited to preserving application state (which is what I assume you mean by persistence of methods, metatstructure, processes, stack frames, and everything) than translating objects into SQL.

Again, I pose that persistence and data management are different things. As evidence of this, imagine that future machine architectures all have primary storage that is high volume, low latency, and non-volatile. This is certainly well within the realm of possibility. Would that make relational databases redundant, now that persistence is implicit? I say "no". A relational database still provides powerful means for expressing and deriving facts in a manner that is concise, composable, dynamically optimisable, and easily proven - and all this is independent of persistence.

For example, compare the conciseness of an ad-hoc natural join in a typical relational DBMS vs. the equivalent OO operations. In TutorialDee, this is simply A JOIN B - where A and B are tables with column names and domains in common - and it's not much more verbose in SQL. In a non-relational language, this requires that you write some mechanism involving iteration and maybe keyed lookups, or perhaps maintaining object associations at all times, and possibly incorporating mechanisms to dynamically choose a mix of these if you wish to implement the optimisation that relational DBMSes provide. Of course, there's nothing to stop you from building OO facilities that employ a terse syntax to use your home-made join, but you're simply reinventing something that a relational DBMS already provides and optimises for you.

Please note, by the way, that I wrote "ad-hoc natural join" above. Predefined static relationships or associations between objects, in an OO environment, are of course trivially navigable. -- DaveVoorhis

I think there's some heat caused by different uses of the term "persistence". I think a relational DB should be used for persistent information. I don't think a relational DB is "purely a persistence engine". RDBMSs offer much more than persistence, but they shouldn't be used for transient information. There are other tools better suited for that. -- EricHodges

We seem to be slicing our terms pretty thin here. When I say "persistence", I include finding and retrieving information. A relational database is ideal for finding three specific items out of three million. I think further discussion of "facts" belongs on a different page. Meanwhile, "summaries, joins, calculated columns, views, restrictions, etc." are all useful in storing and retrieving data that changes infrequently in comparison to the timeframe of a specific process. I think this is another way of stating Eric's observation that they shouldn't be used for transient information. The sorts of systems that Dave describes are all very interesting, and I think have very little to do with relational databases. Yes, it's true that I can use an automobile to run a generator to run a pump to empty a pond. That doesn't change the fact that the primary service provided by an automobile is transportation. A relational database is for PERSISTENCE. -- TomStambaugh

I agree that a discussion of "facts" probably belongs on a different page - like DatabaseIsRepresenterOfFacts - which elegantly (though perhaps incompletely) supports the "fact based" view of relational databases. I'd be happy to move this thread there, if the other parties in this discussion agree to it.

However, I do not agree that the systems I describe have little to do with relational databases. Actually, they are precisely what relational databases do well, and precisely where they are often used. For example, I worked extensively on enterprise bookkeeping/accounting, payroll, inventory, and billing systems (i.e., what are now typically unified into ERP systems - see http://en.wikipedia.org/wiki/Enterprise_resource_planning), for which relational databases are ideal. In such systems, we often need to record the fact that a change occurred, not just the result of the change. For example, rather than represent an account in silico as a virtual real-world account - which might only maintain a current balance - we typically represent an account as a collection of business facts about sales, credit/debit memos, charges, etc., that result in a derived, rather than stored, account balance. In short, we are more concerned with recording information about business activities than we are with simulating the behaviour of business entities. Relational databases are good at the former. OO environments are good at the latter.

Furthermore, I agree that persistence is a foundation of databases (though, interestingly, it is categorically not the foundation of the relational model), but having established that foundation, we treat it as given. By applying the relational model to facts (persistent yes, but that's a convenience and a benefit, not a necessity), we derive all manner of interesting goodness. Date's seminal AnIntroductionToDatabaseSystems text briefly discusses persistence on page 11 (of the 8th edition), and indeed uses it in the definition of database - "A database is a collection of persistent data that is used by the application systems of some given enterprise." However, once stated, persistence is only mentioned on two other pages - both related to the notion of "persistence orthogonal to type" in a section on object oriented databases. So what are the other 980 pages about? Primarily, they are about the use of databases (relational in particular) to represent, process, and derive facts! So, using a relational database as a facts processor is hardly a marginal application of relational databases, it is the purpose of relational databases, with emphasis on "relational" over and above "databases." It is the relational model, independent of persistence, that gives a relational database its power. Otherwise, why bother with the relational model at all? We could simply use an object persistence mechanism.

Another way to look at this debate is to ignore the issue of persistence entirely, and simply view the relational model as a set of tools for manipulating data. If you're not familiar with RelationalAlgebra, try developing a set of classes to implement it. Then experiment with it, as you would any other class library. I did this, back when I too viewed a relational database as little more than a persistence engine, and it was an eye-opener. (My experiments eventually evolved into the RelProject.) Relational algebra and relations are powerful tools for manipulating data, period. I wish every OO language incorporated the relational model, persistence aside, because it simply allows you to manipulate data in an expressive, concise, composable, provable, optimisable way that is much more awkward without it.

OO proponents sometimes seem to have philosphical objections to the relational model and relational databases, and an equal number of relational proponents have objections to OO programming. Both views are limited. OO and the relational model are both powerful and complementary, and I would no more reject one above the other than I would reject the screwdrivers in my toolbox in favour of the wrenches, or vice versa. I'm also lazy - if there's something a relational database will give me for free, I'll take it, and save the effort of reinventing it in the application environment.

-- DaveVoorhis

I'm familiar with the relational model. If I have a gazillion objects, and I want three of them, and my criteria is based on the contents of those objects, then the relational model is the best choice I know of. Attempting to model it in OO is a rathole that at least one OO database vendor (Ontologic/Ontos) fell down -- the problem is that in the OO paradigm, you have to load the object from storage in order to do *anything* on it. Whoops. I still think we're in violent agreement here. Specifically, there is no "impedance mismatch" to be caused by anything. Instead, there are (at least) two perspectives on the data. -- TomStambaugh

Truth.

If I might take a stab at a summary, perhaps toward some ultimate refactoring of this page: OO systems are good at managing transient state; relational systems are good at managing persistence, especially persistent facts and deriving new facts from these. There is no impedance mismatch, except OO environments that support dynamic class and/or object definitions may provide a convenient syntax for handling data retrieved from relational systems external to the OO environment, whereas this may be awkward in OO environments that only support static class definitions. -- DV

I'd like to build on Dave's summary. As sketched above, the key factor in creating an OO environment that can "support dynamic class and/or object definitions" is being object-pure -- all objects, all the way down. I know of no OO environments that are pure and that also "only support static class definitions" -- the static class limitation is a direct consequence of the compromises made by those environments that attempted to maintain non-object entities. My suggestion for those who struggle with resolving this apparent impedance mismatch is to get first-hand experience with a pure OO environment. SqueakSmalltalk is an excellent and free starting point. The next step is to work within a pure-object environment built inside any of the more widely available technologies -- I have first-hand experience with doing this in LispLanguage, JavaScript, PerlLanguage, and PythonLanguage. Ruby proponents claim it is possible in Ruby. If strong typing is desired, investigate EiffelLanguage. The two alleged OO environments that are most commonly encountered -- CeePlusPlus and JavaLanguage -- are the two environments in which it is most difficult to even simulate a pure-object environment, CeePlusPlus being a particularly challenging case. I suggest that most of the claimed mismatch is an illusory fog created by attempting to perform OO development in one of the latter two environments with no prior experience to draw on. -- TomStambaugh


Relational and "Transient"

Re: "OO systems are good at managing transient state; relational systems are good at managing persistence"

When I used to use nimble table environments, I used tables for "transient" stuff also. For example, one might take a result set from an ODBC query (or the equiv. in those days), which could be a local or "transient" table, and further massage it for something local or task-specific. The problem is that most development environments don't support nimble local or virtual tables because they fell out of style. Your statement is only true from a current vendor/implementation standpoint, but not an absolute comment on the paradigms/techniques being compared. It is a BigIron-only DBMS viewpoint. Nimble tables entirely changed the way I viewed programming; and when I got hooked, I had to be dragged away from it kicking and screaming (sometimes called "trolling" here :-). -- top

In an OO environment, you can implement Relation and Tuple classes with associated relational algebra operator methods. A Relation is a container of TupleS, and a Tuple is a container of objects. If you like, you can go one step further and implement relational Expression and Operator classes, where an Expression instance is a container of Operator invocation instances that represent an expression in the relational algebra. This allows you to implement dynamic, automated optimisation of relational expressions, though it's really only of benefit if you're dealing with large volumes of tuples. A lightweight implementation of the relational model (call it "nimble tables," if you like) in an OO context leverages the power of both paradigms. Where it's warranted, I've found it to be of considerable value, and it makes a nice addition to standard or built-in collection/container libraries. If all you need is a transient container, however, standard container classes do the job just fine. But, I digress. My real point in all this: OO systems make it easy to create lightweight implementations of the relational model. Therefore, my claim that "OO systems are good at managing transient state" still holds, even if the transient state is maintained in a lightweight implementation the relational model! I might add, however, that (in agreement with the above) in this case the relational model is good at managing transient state, too. -- DaveVoorhis

       resultSet = query(theSql, diskable=true, compressable=false);

Or

rs = new QueryResultSet(); rs.diskable = true; rs.compressable = true; rs.sql = theSql; rs.execute()


Re: When I used to use nimble table environments, I used tables for "transient" stuff also.

See NotesOnaCeePlusPlusRdbmsApi for some information on a Tutorial D-inspired C++ library that is being used in a commercial product. If you look at the example code there, you'll see a TupleList? object declared. In that library, which is built on top of the Jet DBMS, a TupleList? is implemented as a temporary table in a local Jet database. That actually performed admirably compared to various alternatives. They're treated as read-only, and can't be used as expressions in further queries -- they're considered the end-product of a query, and they have a notion of order and no requirement that all tuples be unique. (Although in practice they almost always are.) Later, I added a LocalRelation? class to represent temporary result tables that are relational, can be modified, and can be used in subsequent queries. The implementation was similar, a transient table in a local database. The UI developers used these to greatly simplify code in a number of their most complex views. -- DanMuller


See also: ObjectRelationalImpedanceMismatch, TypeSafeJdbcWrapper

AugustZeroSix


EditText of this page (last edited December 10, 2010) or FindPage with title or text search