Relations Have Normal Forms

Normal forms are defined in terms of relations (tables)

From AlwaysUseSelectDistinct

[Normal forms apply to the schema as a whole, they say nothing about an individual view or query. Projection is frequently used to derive some bit of information, such as "the set of departments to which at least one employee is assigned". Annoyingly, projection in SQL can produce duplicates, even from a beautifully normalized base schema. Hence the need for the DISTINCT qualifier.]

Incorrect. Normal forms are defined in terms of relations (tables). Stating that "a scheme is in normal form 'x'" is only a statement of an aggregation; that every relation in the schema is in normal form 'x'. Although most useful for the base tables (which still isn't exclusively the schema as a whole), the concept still applies to views and queries.

What I meant above about 'projection returning duplicates' (probably worded badly) was this:

NameDepartment
WilliamDevelopment
WilliamSupport
SmithSupport
SmithSales
Project down to 'Name', conceptually we can just chop off the 'Department' column.
Name
William
William
Smith
Smith
This is where SQL stops. But relational theory isn't done yet. We have to drop the duplicates. This is implicit in all relational operators which could otherwise cause duplicates, and is also what you get if you use Select Distinct. I show it here as a separate step for the sake of comparison.
Name
William
Smith
Okay. Now the interesting bit is what happens if we update this view. In the SQL version above, we could potentially modify the 'William' from development separately from the 'William' from support. Except that we'd be doing it randomly, because we threw that information away already. In other words, the only sensible way to modify the entries is to modify all the duplicates at the same time in the same way; in this way, we can be sure that we don't introduce an inconsistency or some unspecified behaviour. I.e., after modification, we could end up with
NameDepartment
WilliamDevelopment
BillSupport
SmithSupport
SmithSales
or
NameDepartment
BillDevelopment
WilliamSupport
SmithSupport
SmithSales
Now, with the proper relational view, we aren't given a choice: once you remove the information to make a distinction, you're stuck with all or nothing:
Name
Bill
Smith

| V

NameDepartment BillDevelopment BillSupport SmithSupport SmithSales

-- WilliamUnderwood

[I don't know what it means for a single relation to be in "normal form". If you're talking about normalization as it's usually applied to databases, then it is irrelevant to this discussion. DatabaseNormalization is concerned with the various ways that the same data can be arranged in multiple tables. It addresses mainly the elimination of redundancy and ambiguity, taking into account what the data actually means. You can't discuss normalization of a data set without discussing the meaning of the data and its Functional dependencies (which reflect part of that meaning -- see FunctionalDependency). By definition, derived views are redundant; thus normalization can't be concerned with them, but is rather about the parts of the base schema.

Your observations on the updatability of a view derived via projection are correct, and are a good reason to eschew duplicates, at least if updatability of views is desirable. -- DanMuller]

Nope :p

"In all of these assertions that a table is denormalized, it appeared to me that the people making the assertion believed that normalization somehow extends across rows, and maybe across the entire database. Yet William Kent, in A Simple Guide to Five Normal Forms In Relational Database Theory, Section 6, Interrecord Redundancy, states in the very first sentence of that section, 'The normal forms discussed here deal only with redundancy occurring within a single record type.' Believing that normalization extends across rows, these colleagues assert that a tactic of placing an account balance column in an accounts table, and perhaps creating a trigger on an account activity table to maintain it, constitutes denormalization, because the account table balance column is 'redundant.'"

http://www.dbdebunk.com/page/page/622318.htm

Because of this, derived views still have an associated normal form. In the very least, they're in the first normal form (or they're not derived relationally, and therefore outside the scope of this discussion). It is quite possible for a derived view to be in any normal form.

The elimination of redundancy and ambiguity through the 5 normal forms is defined per tuple and relation, and not (repeat not) in terms of the entire database.

--WilliamUnderwood

[I stand corrected. Knowing the heading of a relation and its functional dependencies, you can determine what normal form it is in. If the body of a "relation" contains duplicates, it is obviously not in any normal form. Since first normal form is nowadays considered part of the definition of a relation (see e.g. http://nunic.nu.edu/~ckettemb/DBNorm.html), we actually wouldn't call such a thing a relation at all.]


Actually this page should be about relation variables not relations. Relation as values are not subject to constraints.

It is also common to refer to a whole schema being in X normal form, when all its relation variables are at least in X normal forms. This is especially justified given that the "normalization" process was historically associated with an algorithm and proposed design methodology, that started from the universal relation (imagine joining all the information in all the tables in the database by following all the foreign keys, and you've got the universal relation), thus a schema with one table (relvar) and then refining it to the point where all the relations are in X normal forms. Although of theoretical interest, the (universal relation + normalization algorithm) model wasn't much applied in practice as a design method for obvious reason (cumbersome, uneconomical).

But the point is that it is not so much an abuse of terminology to refer to a schema being in <X> normal form, where X can be 2, 3, BCNF, 4, 5, etc.

The intent wasn't to debate that it's an abuse of terminology (I don't believe it to be), but to clarify that a schema has a normal form only because it's constituent variables have normal forms. Of course, I don't believe we're in disagreement.

On a different topic, does the terminology 'relvar' rub anyone else the wrong way? Having thought about it a bit, I think I'd prefer simply 'variable'. If necessary, add the context 'relation variable'.

The point being that in, say, a OO language (complete with EverythingIsAnObject), we don't typically speak of variables as 'objvars', although we may occasionally say 'object variable' when the distinction is important.

--WilliamUnderwood

Actually this page should be about relation variables not relations. Relation as values are not subject to constraints. Hmmm, I don't think this is right. The normal forms don't seem to be concerned with constraints, other than the proscription on duplicates - but that's so fundamental to relational theory that it's not thought of as a constraint.

Well, the normal form are all about constraints, and a certain particular type of constraint called functional dependency . As a clarification of let's consider the following relation value:

 X, Y, Z
 -------
 1, 2, 2
 1, 2, 3
 2, 4, 5
 2, 4, 6

What can you say about the above relation value with regards to the normal form? Absolutely nothing. When people read some sloppy books that do not make the distinction (even worse, I've seen and heard of such examples given at professional exams), then they do make an extra assumption that the author chose the value "representatively" for the table in question and therefore they assume the functional dependency X -> Y, because they "see it" in the current values. But this is an informal assumption between the author and its readers, it has nothing to do with the formal definition as a relation value does not embed constraints.

Just think about the example above as coming from table A declared as:

  CREATE TABLE A ( x,y,z, primary key (x,y,z) )
It's true that with the above value it seems that (X -> Y) but just insert in that table A the new tuple (2,5,7) and voila, your assumption that A was not in the third normal form no longer holds because X -> Y is no longer a dependency. Functional dependency constraints are always attached to tables (relation variables) and not to sets of values, a set is immutable and therefore subject to absolutely no constraints. There are other types of constraints attached to the whole database but those are not taken into account for the purpose of normalization theory. Hope this clarifies it a bit.

Again, I stand corrected. Although I was well aware of what functional dependencies are, I thought of them as distinct from constraints - as a partial description of the meaning of the data, rather than as a constraint on the form of the data in the database. But you are correct, they do constrain the data. I am certainly being forced to refine my understanding of definitions here. Excellent.

I agree that most writers are terribly sloppy when it comes stating the functional dependencies of their examples. Often the examples are simple enough that they can get away with it.

I disagree mildly with your comment about values, although this is likely a matter of terminology. If you consider the result of a relational query to be a relation value, then such a value can certainly be said to have functional dependencies. In fact, they can be inferred from the relational operations and the functional dependencies of the base relations (actually, relvars) that make up the query, although this can be compute-intensive task. This information can be used to infer the keys of a derived relation, and for optimizing queries. -- DanMuller


"... does the terminology 'relvar' rub anyone else the wrong way?" No. It's a convenient shorthand, and keeps in mind that a system being discussed is relational, just as using the term "relation" instead of the more ambiguous and overloaded term "table" does. Given the amount of confusion that exists in some people's mind over relational, especially given that the topics of OO and relational are so frequently intermingled, the reminder is welcome. -- DanMuller

Fair enough, although I may drop to using 'variable' when the context is clear.


Contributors: DanMuller, WilliamUnderwood and others

See also: AlwaysUseSelectDistinct


EditText of this page (last edited May 22, 2005) or FindPage with title or text search