Relational Integrity

RelationalIntegrity may well be the most widely supported technology for implementing DesignByContract and/or UnitTests. It should be used much more than it is.

RelationalDatabases all provide reasonably automated facilities to generate and check that values in a column are correct. In particular, they can all check that if you have an id or a record in another table, it exists. (ForeignKey checks.) By mapping codes into an AllowedValueTable or writing triggers, codes can be restricted in the same way.

In practice these are rarely used, normally on a plea of "performance." In reality, it makes it inconvenient to load data AND requires that any program which updates the database must be correct (and do things in the right order.)

Turning RelationalIntegrity on in a small database also means that you don't have to develop solutions to what should the code do if there is garbage in the database. In practice, a lot of "integration problems" involve this.

-- MarkSwanson

Foreign key checks are trickier than they look because they increase the opportunities for deadlock. This is especially true of databases that don't have row level locking. -- BrianSlesinsky

The over head of RelationalIntegrity is not as high as many assume in most dbms applications. The worst cases are the high intensity update applications such as many of the Transaction Processing applications.

In a content, or document, management application that uses a relation dbms the over head of RelationalIntegrity (RI) is almost always insignificantly low. The gain from the application of RI is immense. The amount of code, sometimes quite complex code, that is removed by the use RI is startling to see the first time it is applied within an application system.

For large document management and publishing applications, such as what I am working on at the moment, the use of RI is a life saver. The code that is written against the dbms ought not to know many of the conditions that can arise from bad data. If the data consistency can be built into the schema with RI the over all application will be significantly simpler and guaranteed correct according to the specification of the data.

The biggest win in this case will be a much simpler output application, such as a web presentation application. The application of data integrity checks in the web application will not need to handle the conditions that are guaranteed correct by the dbms itself. This also makes it possible to write stored procedures within the schema that are simpler and more maintainable than they otherwise would need to be.

Basically, RelationalIntegrity is a "very good thing".

-- GregWolff

One practical difficulty I've found with usage of RI for validation is dealing with the error messages returned by the DBMS. They are often too cryptic to echo to the user as-is, and thus complex translation lists need be built. And, they are often hard to forecast ahead of time. One may have to build the list by creating errors in simulation that are sometimes difficult to duplicate without massive transactions to trigger concurrency difficulties. The best one can do sometimes is something like:

ERROR: The database did not like your transaction. Please resubmit it. Here is what the database said: "The Flux Capacitor has fubarred the transdimensional reality matrix because the carborator is missing ionic hydrocarbons."

--top

If performance is an issue, you can discover exactly where it's an issue by measuring the performance of the system as it works. Then you can drop whichever RI constraints cause the most costly overhead.

After developing a system that works correctly with all the RI constraints in place, you can typically drop any or all of the constraints without changing the behavior of the system.

Contributors: JeffGrigg

When wearing a BlackHat I sometimes wonder if RelationalIntegrity (RI) is definitely a good thing in a component-based OO system?

Sure I use it...but I also reckon it scatters business rules to YetAnotherTier?. If my DomainObject's (aka BusinessObject's) have the right rules why does my database need 'em also? -- DanGreen

Using RelationalIntegrity also provides a set of rules that is rather difficult to UnitTest. What are people's thoughts on this?

People's thoughts on this are that SoftwareEngineering and ComputerScience didn't start nor will they end with UnitTesting. There are lots of other good things in life, among them DBMS enforced integrity constraints.

How do you ensure that the constraints are doing what they are supposed to, if you can't automatically regression test them?

Oh, but you can have regression test: try to insert invalid data or update or delete data so that you break the costraints, and make sure these operations fail. It's even easier to validate: each constraint corresponds to a well identified business rule. Ask the business user for examples of data that is invalid, put that in a UnitTest and make sure they will fail to insert/update/delete. Other than that, the same rule applies for constraints as for regular code: stare a significant amount of time at the code and try to figure out what the hell it is doing. A code review will also help.

I use RelationalIntegrity constraints for the same reason I UseAssertions: They help detect errors while writing and testing code. In effect, they're additional UnitTests. And, once the system is fully tested and ready for production, I can feel free to disable them all, just as one does with assertions. (In practice, however, I usually leave them in, in production. Except for a few, when I can show they cause significant performance problems.)

I always find it entertaining how shocked and appalled the "true database believers" are, when I show them that their database rules (RI constraints, column rules, and even triggers) can't enforce all business rules. (There are limits to any technology.)

-- JeffGrigg

Just to give an example in your support, you cannot enforce that a relation on TxT defines a plannar graph, most of the business rules are not that complicated though and relational integrity constraints should solve you a lot of your integrity requirements.

However this doesn't mean you can give up on them, they are there to help you. You have to watch very carefully when you drop integrity constraints, generally you shouldn't do that.

-- CostinCozianu

Why is that exactly? I was under the impression that the database constraints were simply predicates which were required to be true, and that they could involve arbitrary calculations and relations? I.e., if you could write code to detect the situation, you could write a predicate to eliminate it. Or is yet another example of the failings of SQL vs relational theory? --WilliamUnderwood

That is correct: in theory, it should be possible to enforce any rule of any complexity as a database constraint - e.g. a constraint like CHECK (NOT EXISTS (SELECT ... /* data that violates the rule */)). However, in practice most (all?) DBMSs currently in existence do not support constraints of arbitrary complexity, e.g. CHECK are constraints limited to checking the values in single rows of one table. -- TonyAndrews

Integrity should be handled by transactions. Local integrity constraints are like assertions for transactions, so the analogy is exact (homology?). If we extend the implied table to class mapping and project into the object domain, this would be a feature that prevented the creation of orphans outside a transactional context. So far OO environments don't support this functionality implicitly. It will probably come along however as EJB etc. reinvent relations as associations. -- RichardHenderson