Response To Bakers Anti Relational Paper

Response to Henry Baker's anti-relational paper:

http://www.pipeline.com/~hbaker1/letters/CACM-RelationalDatabases.html

I know the paper is a bit dated, but still warrants a response.

...I can categorically state that relational databases set the commercial data processing industry back at least ten years and wasted many of the billions of dollars that were spent on data processing...

With the recent [in 1991] arrival of object-oriented databases, the industry may finally achieve some of the promises which were made 20 years ago [early 1970s] about the capabilities of computers to automate and improve organizations.

Can we see some examples of "improving organizations"?

...Virtually all commercial applications in the 1960's were based on files of fixed-length records of multiple fields, which were selected and merged. Codd's relational theory dressed up these concepts with the trappings of mathematics (wow, we lowly Cobol programmers are now mathematicians!)...

And OODBMS tend to resemble the NavigationalDatabases of the 60's. Being similar to something old by itself does not make it "bad". DrCodd found NavigationalDatabases messy and undisciplined, and set out to clean them up. This was his motivation for relational technology.

Unfortunately, relational databases performed a task that didn't need doing; e.g., these databases were orders of magnitude slower than the "flat files" they replaced, and they could not begin to handle the requirements of real-time transaction systems.

Some of the largest systems today run on RDBMS. That paper was written several years ago and many performance improvements have since came along. As far as performance, nothing is always the fastest in all circumstances and nobody claimed RDBMS are the fastest for every application or usage pattern. However, the one key area where they do shine is the ability to handle queries from multiple perspectives and unanticipated requests.

Competitors tend to hard-wire a certain "access routes" into the structure. Relational focuses more on storing facts about business entities (nouns) and factors the information to gain optimal and organized OnceAndOnlyOnce. The idea is that the DatabaseIsRepresenterOfFacts. How the information is used is of secondary importance (and perhaps none in theory). Divorcing meaning from usage issues allows it to better handle future and diverse uses.

Because organizations, budgets, products, etc., are hierarchical; hierarchies require transitive closures for their "explosions"... Parts "explosions" and budgets "explosions" were the norm, and these databases could easily handle the complexity of large amounts of CAD-equivalent data. These databases could also respond quickly to "real-time" requests for information, because the data was readily accessible through pointers and hash tables--without performing "joins".

Hierarchies are of dubious or limited value for most non-trivial business classification system needs. See LimitsOfHierarchies and RealWorldHierarchies. Even if there is a general tree taxonomy in some of those examples, often there are other access possibilities outside of the hierarchy. For example, accountants and procurement might not care much how parts in a car fit to each other. But, they still need a lot of information about parts.

I won't dispute that custom-made structures can outpace RDBMS for specific needs, but over time the trend is usually multiple departments and users that need to see the same information from different perspectives. This is where RDBMS shine.

Computing history will consider the past 20 years as a kind of Dark Ages of commercial data processing in which the religious zealots of the Church of Relationalism managed to hold back progress until a Renaissance rediscovered the Greece and Rome of pointer-based databases. (emphasis added)

Odd, many of us relational proponents think the same thing about OODBMS and OOP. Some of us see such pointers as the GoTo's of data structures.

--top

What's funny is that a well normalized relational database employs foreign keys that serve exactly the same role as pointers. So, a well-designed OODB ought to be equivalent to a well-design RDB. I think this is perhaps what Baker was trying to get at. --SamuelFalvo?

The ability to perform arbitrary joins between relations/tables is arguably a more general facility than pointers, at the expense of some loss of performance compared to pointers, though explicit, predefined foreign key relationships may be internally implemented using pointers. However, an OODB inevitably suffers a potential performance hit by typically needing to unmarshal a whole object in order to perform a query on some attribute of the object. A relational system need only unmarshal the relevant attribute. -- DaveVoorhis