Objects Are From Mars Tables Are From Venus

No, this isn't another rehash or discussion of the ObjectRelationalImpedanceMismatch - nor is it yet another shot in the object/relational HolyWar that occasionally erupts around here.

Rather, it's a bit of a plea for peace and greater understanding...along with an explanation of why, which I hope will appeal to some of the partisans in the debate.

This is based on a posting I made to LambdaTheUltimate a while back.

-- ScottJohnson

Objects are from Mars, and tables (relations) are from Venus.

Why? They're different. They are optimized for (and best used for) different things; they have different roles to play in programming.

Are they really that different though? Not really, people store data in objects all the time, and people store data in structs all the time... similar to storing data in databases... Objects have methods (procedures), whereas databases don't (except for triggers and stored procedures). Objects and databases actually do overlap quite a bit. Associative arrays, hashlists, and even plain arrays are miniature databases. Objects that talk to XmlFiles? or IniFiles? are similar to miniature databases too. It would be better to replace objects/structs and hashlists/associativearrays with relations in many cases.

(I should note up front that the assignment of Mars to objects and Venus to relations is entirely arbitrary; I could have said that tables are from Mars and objects from Venus and it wouldn't change anything).

Much like men and women. The title of this page obviously comes from the well-known pop-psych book Men are from Mars; Women are from Venus. There are some things that men generally do better than women, and vice versa - and by this I mean well beyond their respective reproductive roles. A society consisting of both genders is far more likely to thrive than a society consisting of only men or only women - even if we exclude the homogeneous society's lack of reproductive ability, this is still true.

With a few exceptions generally regarded as cranks; one doesn't see people advocating the idea that men (or women) are superfluous; and that we should build societies consisting of only one of the genders. The idea is patently absurd. One need not be an expert in psychology and other social sciences (let alone biology) to understand why.

So it is with tables and objects. (Or objects and tables). Both do different things, and do them well. As it so happens; tables can be used for the things that objects excel at; and vice versa - so software systems can be constructed with only one or the other. But it is argued here that (in the general case) that is folly.

What is it that objects do better than tables, and vice versa? Before the question is answered, we will consider a few bits of elementary software design/analysis advice that are (or should be) well-known to novice programmers (expert programmers, of course, likely have forgotten all of this. :)

When one is performing ObjectOrientedAnalysis? (OOA) using UML or something, one is often asked to consider the relationship between various objects/classes in the system. Novice programmers learn about the different relationships (is-a, has-a, uses-a, was-a, etc.) and how to model them in an OO programming language (use inheritance for is-a; use composition for has-a, use composition for uses-a, etc). Note that the "has-a" and "uses-a" relationships are both best modelled using object composition - having a data member pointing to an object of the appropriate type as part of the class definition. Yet the two cases are different: "has-a" means that one object is a component of another (such as a car has a steering wheel); "uses-a" means that one object makes use of another (but such usage might well be shared), much as I am blessed with a supervisor; but the guy in the next cube is blessed with the same supervisor. In both cases, the boss is not a component part of either of us. Despite the differences between the two cases, the same design technique - composition, via holding a pointer (explicit or implicit) to the component object - is used. (The above is true for Smalltalk and Java; C++ and Eiffel do allow "inline" composition, which can sometimes be used to handle the "has-a" case and demarcate from the "uses-a" case, which should be done via pointer).

Switching gears to database schema design. One of the stated advantages of the RelationalModel is that one concept - the relation - is sufficient for all purposes. Which it is.

For clarification purposes, note that "relation" in "relational" does not refer to inter-table links, but rather tables themselves. It comes from an old math term, it is my understanding.
- Yup. Tables are used to store entities (like Employees, Assets, and such) and relationships (like WorksFor? and WasOrderedBy?). Relationships map well into relations; entities...well, I think they map better into objects. :)

However, when young MIS students learn how to transfer business requirements to database schemata, it's commonplace for them to do entity-relationship modelling. Entities are the various actors and other goodies within the enterprise (employees, assets, money, inventories, customers, etc.) and relationships are the glue between them (who works for whom, etc.). In some cases, the lines are blurred (relationships may contain data beyond ForeignKeys, and entities may contain foreign keys in the 1-1 and N-1 cases as an optimization), but in general, every table in the system represents either an entity or a relationship. Yet the same device - the table - is used to model both. Now the RelationalModel itself doesn't place any limit on the domain types which may form table attributes - but SqlLanguage has a God-awful primitive type system such that most RDBMS systems must use relations to model domains.

So...what is the way out of this mess? DateAndDarwen, in TheThirdManifesto, propose an extension to the RDBMS that allows complex and arbitrary domains (essentially immutable objects) within tables as attributes. I believe their approach is correct; though they don't go far enough IMHO - the type system they propose is still rather restrictive compared to most OOPLs (no subtyping other than via restriction, etc.) Some of their restrictions they justify based on a conservative approach - they are not prepared to recommend anything that they haven't studied in sufficient detail; I will give them the benefit of the doubt (there is much changed between the two versions of TTM published so far). But DateAndDarwen are still firm advocates of the RelationalModel, and the introduction of objects seems to be done for the purpose of making OO-centric app programmers lives a bit easier (and eliminating the design mistakes that many OODB products have introduced; see FirstGreatBlunder and SecondGreatBlunder).

However, the reason to include (immutable) objects as attributes within tables is more fundamental than simply making the ObjectRelationalImpedanceMismatch less painful. The reason to do so is as follows:

Entities are best modelled as objects. Relationships are best modelled as relations.

If we had a good hybrid OO-relational system, then we could have the following changes (which I believe, assuming a well-designed system, would constitute improvements).

The "has-a" case (as the term is used in OOA/OOD) could continue to be modelled via object composition.
Likewise for entities (from ER analysis), which would be modelled by objects/classes, rather than by tables.
- "Ownership" of component parts is traceable.
- Encapsulation of component parts still is available
- Polymorphism, including subtype polymorphism, would be available; and wouldn't require any ill-conceived notions like TableInheritance.
- Excluding the issue below of N->1/1->1 pointer optimization, objects would no longer contain cycles; making their serialization/persistence a much easier problem to solve - complex relationships between entities would be handled at the relational layer; which Oracle and company already know how to do.
- Redefinition of entities/domains/classes might be possible without having to alter database schemata directly.
The "uses-a" case, rather than being done via object composition; could be done using a relation (table) instead. This gives the many advantages of the relational model in the instance where it is most needed:
Likewise for relationships (from ER analysis).
- The ability to perform joins and such.
- A natural representation of 1->N and N->N relationships. (It is a common practice, of course, to deal with 1->N relationships by inverting the sense of the relationship to an N->1, which can be easily implemented with a pointer. Often times, that introduces class dependencies that you otherwise would not want to have).
- The ability to add relationships to objects without redefining the object definition.
- Optimization of N->1 and 1->1 as pointers is still possible (though this ideally should be demarcated as a separate case from ordinary composition in the programming language itself, rather than in just the design docs).

While it is true that one can model relationships with pointers (producing a rather heavily denormalized dataset) and domains with tables (producing an overly-normalized dataset), it is my belief that for complex systems, it is a better approach to use both OO and relational techniques, in a system which is designed to accommodate this efficiently. Further, it is my belief that arguments for jettisoning one or the other (whether made for UniformityUberAlles or for more "practical" reasons) are barking up the wrong tree.

After all - while objects may be from Mars, and tables from Venus; we're all working here on Earth. A planet whose orbit lies between.

How are tables "too normalized"?

When used for entities; many of them would (IMHO) better benefit from encapsulation then from exposed schemata... leading to your next comment:

I think part of the problem is that current RDBMS are static.

Static in which way? Static in that schema evolution (changing the table definition) is hard? In that tables are generally monomorphic - all tuples inside 'em must have the same shape (the word "shape" is, of course, a polite way of saying "type" to someone in the ThereAreNoTypes crowd) :).

The issue of being highly encapsulated (as objects often are) vs highly exposed (as tuples are) is one key property that leads me to the position stated above. Joins and such are far more useful when done on relationships - to automagically link foreign and candidate keys together to synthesize complex properties of the system. Joins involving non-key attributes are rare (if done at all); the rationale for exposing component attributes as columns in a tuple is weak. Entities, on the other hand, benefit a bit more from encapsulation - entities are (I believe based on experience; I have no data to back up this claim so take it with a grain of salt) more likely to be subject to schema evolution, which encapsulation makes tractable.

Relationships, on the other hand, are where you want the full power of the RelationalAlgebra. The need to protect a relationship behind an encapsulation boundary is far less - if you need new types of relationships, just create new tables.

Dynamic relational could be possible and practical I believe (see MultiParadigmDatabase) to fill in some of the rough areas of relational implementations. The "must have same shape" issue disappears. Yeah, I know, IwantaPony. But ponies are technically possible. -- top

Much of that page seems to be repeating some of the (alleged) mistakes that DateAndDarwen counsel against. Some of the ideas are good (I'm all for polymorphic tables if done correctly, the OODBMS hack commonly known as TableInheritance doesn't suffice); others I haven't had time to think about or investigate.

See DynamicRelational for a description minus the OO-targeted features.

What exactly is "polymorphic tables" if it's not TableInheritance?

TableInheritance, if I understand it correctly, refers to the creation of supertables/subtables (along with structural subtyping of records) - see TableInheritance for more. "Polymorphic tables" is a bit simpler; just the ability to insert derived types into a table expecting a base type. Granted, the concept of a polymorphic table is decidedly non-relational. At any rate, it warrants further investigation. -- sj

I might point out that I'm approaching this topic from a different direction than you. You seem to be approaching it from a database perspective; how can we build a better DBMS and DB architecture. I'm approaching it from a ProgrammingLanguage perspective - how can we best model algorithms and data; persistence and transaction semantics are for me a secondary concern. (Not that I would ignore them, of course...)

But anyway; more on this tomorrow.

-- sj

I share the same goal. However, my approach to code simplicity is CodeAvoidance: put the stuff that DB's do better in the DB (or table engine) and let code focus on what it does best. I often view OO as an attempt to put data structures into code when I think the reverse approach is a better goal, or at least an option to explore more. Find out what to remove from code, not what to load it up with. It is easier to sift, search, and index meta data than code. Some view RDBMS in the Oracle BigIron sense. But I am also fond of NimbleDatabase technology. -- top

AugustZeroFive