Database Is Software Too

A DBMS is a software library. The problems of Databases in XP are generally traced to the root cause of thinking databases are different. Warning, admitted sweeping generalization of job titles coming, I apologize in advance for the exceptions to the rule... If you hire people who aren't software engineers (e.g., the vast majority of DBA's), then you can't expect them to react like a software engineer to a problem. Likewise, if your software engineers think databases are boring, beneath them, and should be relegated to DBAs, well, then they get what they deserve. --JeffWinchell


The above is fairly common in OO-centric development circles. The DB is treated like a low-level device which needs wrappers and converters to be converted into app-level objects. While I disagree that DB's are (or should be) "low level", the argument could be made that keeping everything as "objects" makes for a more consistent view even if there is a "translation tax" between relational and OO to get it. While I would rather lean toward a relational view as the primary big-picture view rather than objects, the OO-for-consistency argument is an interesting one. --top


From DatabaseIsRepresenterOfFacts:

You absolutely can't trust the code to maintain your data integrity.

In the RelationalDatabase more rules about data integrity (consistency of stored facts) can be stated in a declarative manner, which is safer and easier to verify, as opposed to an imperative manner in an OO programming language. The imperative approach is more subject to application bugs, and no matter how much you trust your programmer capabilities it is harder to verify their correctness then if you just choose to simply declare business rules.

Are modern relational databases mature and bug-free enough to be given absolute trust in maintaining data integrity? They are software too, and rely on other software -- the operating system, networking layers, hardware microcode, so forth -- that is subject to bugs as well. But is application code so much poorer than relational database code? After all, it's just code in the DB product that enforces the integrity there, so it's clear that it's possible to write code which satisfies a DBA's desire for rigor. --StevenNewton

Question is whether you want to get the integrity code right OnceAndOnlyOnce or gamble on getting the code right once per application. From a risk management perspective, relying on application code is foolish. Even if a given application is perfect, in most cases it won't be the only application that manipulates data. Persistent data tends to outlive applications, and people with access to persistent data tend to find new applications for it.

I think that all software (DBMS, OS Kernal, network protocols) that is heavily used by millions of users is less buggy than other software shrink wrap products. And I think that software that protects the validity of important stored information (e.g. OS/hardware code that ensures bytes are written on a hard disk, DBMS storage and transactional engines, etc) is even less likely to be buggy.

But then there's the particular rule enforcement code written by us mere mortals for business systems. I agree with comment above that simple declarative code reduces the chance of error. --JeffWinchell

Relational databases are orders of magnitude more bug-free than typical application code. (Spoken by one who's found and reported database errors -- almost all of which were in the stored procedure/trigger programming languages.) -- JeffGrigg

Declarative statements are generally much easier to "get right" than procedural code -- up to a certain level of complexity. The RelationalIntegrity, unique index and field validation rules of relational databases are almost impossible to get wrong.

But the problem with relying on declarative relational rules to ensure data integrity is that the rules supported by relational databases are not expressive enough to check everything. Sure, you can ensure that field CXXX contains a value 'in ("D", "Q", "X")', but what about that when CXXX is "D" then DYYY must not be NULL? OK, you can express this in triggers -- in a procedural language! (And where did your non-procedural declarative advantage just go?) But what if the two columns are in different tables? A: Doom; there is no way to express constraints that must hold at transaction boundaries, but may be violated in the midst of a transaction. And because of the limitations of SQL syntax, you can't insert, update and delete multiple tables "all at once" "between" rule validations.

So you're always left with lots of things that can only be validated in application code. Thus, it's always a bad idea to give 3rd parties (like users or other systems) write access to your tables; you can't define sufficient rules or triggers to keep them from corrupting the data.

I'm in favor of expressing as much validation logic in SQL as can be done in a declarative manner. But don't fool yourself into thinking that this eliminates the need for applications to validate data too. -- JeffGrigg

Having applications validating the data violates OnceAndOnlyOnce as soon as two different applications are accessing and validating the same data. If your hand is forced to violate OnceAndOnlyOnce by an inadequate language for validation or inadequate DBMS features, you can justify the need to do so... but you should also recognize the MissingFeatureSmell for what it is and start insisting (with what little power you possess) that the features you require be supported by the DBMS. In this case, the automatic delay of triggers until time to commit the transaction would be an appropriate and relatively easy feature to implement to support at least OnceAndOnlyOnce procedural validation.

Declarative statements of the sort "the type of DYYY is some function of the type of CXXX" (with 'NULL' being considered an option in a union type) would require dependent typing become part of the data manipulation language (related: DoesRelationalRequireTypes).


In addition, software libraries (e.g. RDMBS) that make it absurdly difficult to access information stored on disk, except through their well-defined APIs, APIs that include the ability to simply write constraints, minimize the need to redundantly or inconsistently validate information in other code. If these software library APIs also make it easy for many other software libraries (programming languages/applications) to safely read and write this information, then the value of these software library APIs (e.g. RDBMS) grows enormously.

Not doing this (the typical custom built middle layer of business rules) generally greatly increases the difficulty for mere mortals to access the information... and those mere mortals tend to have the greatest power and ability to increase strategic value in an organization, so restricting their access is very unfortunate.

I think I've tried to express way too much in too little time and space. Feel free to refactor. --JeffWinchell


EditText of this page (last edited July 9, 2010) or FindPage with title or text search