Database Application Independence

DatabaseApplicationIndependence

A database as a custodian of data in the enterprise should be independent at a logical level with respect to ANY and ALL applications that access it at the current time, or might access it at a later time.

An enterprise should literally own its own data, therefore they will have an assigned DBA that will master the database, and the data in the database should not be dependent on any particular application accessing it.

ObjectDatabases? are one clear example of the violation of this principle . They are application dependent in two ways:

  1. They depend upon an application to 'load' objects back from their stored image and give them a meaningful usage
  2. They depend heavily on a particular language that the application was written in (usually Smalltalk/Java). If I want to access enterprise data from another language , or even from a dummy report generator, I'm out of luck.


from ObjectAndDataAreSeparate?: CostinCozianu wrote, If the best solution would be to write something in C, then they [the customer] are out of luck [because the data is locked in an ObjectOrientedDatabase specific to a language]. That means you locked your customer in, to your domain of expertise. If you had developed with Oracle, I might come later and do something in Scheme if it so pleases me. That is what the DatabaseApplicationIndependence principle is for.

Very true. However, the problem is mitigated on two fronts. The first mitigating factor is the client themsevles. I have presented the option to my clients: (1) Code the application using an ObjectOrientedDatabase and risk possibly locking yourself into a proprietary solution or (2) Code the application against a RelationalDatabase, almost completely avoid the proprietary problem but increase the development time and costs by about 30% - 50%. Almost invariably, they opt for the first option. Furthermore, I should note that a RelationalDatabase is not without its risks, too. Primary among those risks is that the requirements will change in such a way that makes implementation in a RelationalDatabase very cumbersome primarily because ForeignKeysCanOnlyReferenceOneTable. (see also ObjectRelationalMappingCostsTimeAndMoney).

The second mitigating factor is that the world you describe in DatabaseApplicationIndependence doesn't exist. At least, I haven't found it. Enterprises generally don't have single data stores where they keep all their data. Most of the time, a corporation's data is scattered among various datastores (not only relational, but hierarchical, flat file, or some proprietary scheme). Oftentimes, as you would expect, the datastores contradict each other.

The reason that these separate datastores exist is that applications tend to be developed vertically in an enterprise rather than horizontally across it, so the executives sponsoring the development don't particularly care about the data in other vertical organizations within the same company. Complicating an integration process is that the business rules surrounding a particular logical piece of data are often different from one vertical business unit to another (sometimes, they contradict each other!). Even if the business rules could be resolved across business units, I've found that political issues prevent datastores from ever really being integrated.

Therefore, having another application specific datastore in the form of an ObjectOrientedDatabase isn't a particularly new problem. At worst, it's more of the same problem. However, since organizations have found acceptable ways of dealing with the multiple datastore problem, it just isn't a big issue.

-- MarkAddleman

But one can have multiple little RDBMS also, if you want to go that route. Centralized versus decentralized is not necessarily tied to a relational-versus-OO decision. I supposed, however, that RDBMS can be cumbersome for such because they tend not to target that. See SimplifyingRdbms.


A database as a custodian of data in the enterprise should be independent at a logical level with respect to ANY and ALL applications that access it at the current time, or might access it at a later time.

I have a lot of problems with this. The data you collect depends on what you think is important. What you think is important depends on what you are going to do. What you are going to do is more or less the same as 'the applications that access it at the current time, or that might access it at a later time'.

So, this is impossible. It is impossible in theory, and it is impossible in practice. The reason that databases work well in practice is that the applications you are going to use in the future are usually fairly similar to the ones you have done in the past. If you can make a database work with all your current applications, and if you make it possible to change your database, and if you periodically redesign your database as new applications arise, then you will have a database that is fairly independent of your applications, and most new applications will be able to use it without a great deal of difficulty.

DatabaseApplicationIndependence is not a bad ideal if you don't take it too far. It is quite reasonable to have a separate DBA, and to complain about a DB design because it is designed just for one application. But don't think that you can ever reach perfect DatabaseApplicationIndependence, because you are just fooling yourself! The database is just a model, and all models are imperfect representations of reality. Eventually you will discover its limits. --RalphJohnson

I am in two minds over this. I like relational databases because they are very functional with respect to big systems. Most large systems like a trading system must share data because they work in concert with various other big systems. So far the only solution is a relational database. Object databases might sprout relational functions to support this, but if so then they are object-relational and therefore must be separating data from behaviour.

The alternative is to have functional interfaces on the data-server. This seems to be a poor SeparationOfConcerns. A data-server should serve data. Asking it to also do complex processing, even provide a general execution environment sounds too much. Three-tier architectures were generated to solve this need. Really they are two-tier servers. The servers are execution logic and data services. The execution logic translates and controls access to the data, the data service handles data integrity. Relational databases map very well to this common architecture. OODBs use some nasty caching schemes to do this and fail a lot. EJB's suffer horribly from the constant thrashing of the entity beans in and out of the database. This is a classic bandwidth problem. If you have to load an object before you can look inside it, then you will always thrash your dataset. You can't help but load all of them. Or you can reinvent indexing and then your encapsulation is broken anyway. Or you can cache the data which gives you a nasty distributed update problem.

I prefer, for now, to use relational databases. I think object databases will eventually look like clever classloaders layered onto a relational database, anyway. In the mean time, we can write our own convenience layers and enjoy the benefits of an intrinsically reliable persistence mechanism. -- RichardHenderson.


The ideal is that the DatabaseIsRepresenterOfFacts. How those facts are used by specific applications should be independent from the existence of those facts. If more facts are needed for an existing or new application, then those facts are added to the database. Adding new facts generally does not affect existing facts (if properly normalized). Of course the ideal and reality sometimes conflict, but that is the goal or philosophy behind the idea.


See also: DatabaseVendorLock


EditText of this page (last edited February 26, 2006) or FindPage with title or text search