Each Column Name Be Unique Among The Tablespace

When Each column name is unique among the tablespace you end up with column names like this:

Table: Persons PersonId?: Integer PersonFirstName?:String PersonLastName?: String PersonBirthDate?:Date PersonEyeColorId?:Integer

Table: EyeColors? EyeColorId?: Integer EyeColorName?: String

While if you follow the convention that EachColumnNameBeUniqueInTable? (I believe an ObjectWeenie like me prefers this one):

Table: Person Id: Integer FirstName?:String LastName?: String BirthDate?:Date EyeColorId?:Integer

Table: EyeColor? Id: Integer Name: String

I feel that repeating the name of the table on each column to achieve name uniqueness in column names is ugly and redundant, but I guess in a query with lots of short alias could improve readability:

select p.PersonFirstName?, e.EyeColorName? from Persons p, EyeColors? e where p.EyeColorId? = e.EyeColorId?

versus

select person.Name, eyecolor.Name from Person person, EyeColor? eyecolor where person.EyeColorId? = eyecolor.Id

Mmm, I still I feel that the second query looks cleaner and easier to read... do you agree?

For few-table queries it does, but I've found very nice things with DB-wide uniqueness when doing more complex queries. The need for table aliases almost disappears. But rather than append the full table name, I'd recommend a 2-or-3 character table abbreviation:

table: employees_emp --------- emp_id emp_first emp_middle emp_last emp_hour_rate etc...

I'm sure everybody has their personal preference and will defend it with vigor. I can only say what I found seems to work best from my perspective as a query writer (and occasional DBA).

--top

Q: I'm honestly curious what "nice things" you have discovered about this particular naming discipline. Are you willing to describe a few?

A: As mentioned above, the need for table aliases nearly disappears. A non-correlated query will not need them.

Second, there's rarely any naming conflicts with the final column names of a given query, reducing the need for column aliasing. This is especially nice for ad-hoc queries where you don't want to waist time writing aliases.

Third, you can search for a given column in script files if you want to study its usage or make a wide-ranging change in its usage. If aliases are used instead, then they may not be consistent from query to query.

Q: What happens when you rename a table?

A: The same thing that normally happens when you rename a table: you break all the queries and views on that table, so you need to scramble around and fix them all. Alternatively, you could create a 'view' of the original table to essentially alias it. Neither the problem nor these solutions change if you go just a bit further to also renaming the prefix for the columns.

Strange. I always thought that the aliases were intended to allow factoring out of the commonalities of the column names. They allowed to use the same name for a column with the same meaning. I really wonder how one can give up this clean bit of extra information (same name=same meaning) without gaining anything except maybe a bit less typing. I mean prepending the table name to the column is just a crude form of alias use (just use the alias everywhere) without any of the benefits of a) usable for joining the same table multiple times and b) making the table name explicit. I really can see that this gives a positive trade-of for rather simple schemes.

I added a third benefit above, which aliases cannot give. Often in larger systems most queries require at least one join such that alias would otherwise always be needed anyhow. Thus, a virtual alias via a table prefix/suffix provides that automatically. Plus the reduced need for column aliasing since they are already unique. The first reason is a wash, but a net benefit of the last two. I cannot really isolate all the reasons why it "felt better" just yet, but uniqueness just seemed to improve things. I was skeptical at first, but became a convert. The one downside though is that the column names if used as-is are not very compact. For small values, the column name becomes a real-estate hog. --top

Another approach not mentioned is to only add a prefix/suffix to columns that actually conflict.

Yes, but they might have conflicts in the future, and that means your columns names will not be uniform (only the first table which ever used the a column with name DESCRIPTION has it, the others name it: [TABLENAME]_DESCRIPTION. If someone else looks at your database it will seem like your naming is not following a consistent rule

I am not sure what you mean by the first sentence. As far as the "description" example, you are correct. The first use table generally gets the shorter one, which may make the naming look arbitrary. However, from experience one generally knows that words like id, description, status, location, etc. (See list below) will be used often and could potentially pre-prefix many of such cases. No, its not perfect, but nothing is.
- ID
- Description
- Status
- Location
- Address
- Rate
- Month
- Period
- Category
- Classification
- Group
- Amount
- Code

Thus, if a name is already unique among a DB's table-space, then there is no reason to prefix it. Using the above Employee example, if "hour_rate" is unique among all the tables, then we don't need to prepend it with "emp_". However, this requires keeping an updated list/log of the columns in Microsoft Access or the like to verify their uniqueness. --top

Well, that is only if you are using a poor database, in an enterprise grade one (like Oracle) you can query for the colum names (so you have no need to manually keep a list of columns):

 SELECT * FROM user_tab_columns

My experience with Oracle is that what you think is a name-space is not a name-space. Older systems often have a patch-work of permissions and one has to be careful about their viewpoint assumptions. But in most cases one could probably get help from the DBA for a big project. Also, at least with MS-Access you get immediate feedback rather than having to keep querying for each new name.

Q: Why is repeating the name of the table in each column as a prefix not a violation of DontRepeatYourself?

A: The use of a table prefix for each column name is a violation of DontRepeatYourself... in much the same manner that HungarianNotation involves such repetition. In this case, some of the main arguments for repeating yourself in the column names are:

It is a predictable and consistent way of achieving unique identifiers. Sometimes boring is beautiful.
- Automatic prefixing with the table/view name if a name collision in a join is detected is also a predictable and consistent way of achieving unique identifiers, and it does not violate DontRepeatYourself.
Unique identifiers have some nice properties deriving just of their uniqueness
- But that does not mean I have to write them, why not let the tooling generate them by prefixing with the table/view name if a name collision in a join is detected (or perhaps whenever a join is made, for all column names, regardless of possible collisions?).
If you DontRepeatYourself here, you'll end up repeating yourself anyway when writing column aliasing code for join queries.
- So, perhaps our database tools should provide use with "SmartAlias?" when duplication is detected in a particular join so that we do not need to violate DRY? (all that it had to do is prefix the name of the table if it detects a duplicate, seems pretty simple to me)
- That would work, so long as it was consistent and predictable. It would help if you could choose the prefix.
- It should use the table/view name as prefix, or the table alias, if one was specified (that seems consistent and predictable for me).

I guess it is time to add another flaw to the list in SqlFlaws...

While this may belong among SqlFlaws, it is not a problem unique to SQL. It is a problem associated in general with 'joins' where the same name in two different records may possess different semantics.

There should be a rule on Sql joins preventing this kind of name collisions. In another relational language with a rule to prevent them, this problem would not exist. So I think it is unique to all relational language without rules for name collisions (And Sql is one of them).

Well, one could also have query structure that simply eschews the whole issue of name collision. I'm particularly fond of the PrologLanguage/MercuryLanguage/DataLog approach to query specification.

They do? How would your write a query like "select person.Name, eyecolor.Name from Person person, EyeColor? eyecolor where person.EyeColorId? = eyecolor.Id" in PrologLanguage/MercuryLanguage/DataLog ? (and avoid the naming collision between the Name of person and the Name of eyecolor? (and without a prefixing (or postfixing) rule like the one proposed here)

RE: [...] a violation of DontRepeatYourself... in much the same manner that HungarianNotation involves such repetition.

Only the perverted version of Hungarian violates DRY: "the original article never advocated encoding the actual machine representation "type" into the name, but to encode the semantics that the "type" cannot and does not give". Since that information is not given by type, TrueHungarianNotation is not an example of a DRY violation.

That's a reasonable objection. Regardless, HungarianNotationAsActuallyPracticed still violates DRY, still has many adherents, and still has some reasonable arguments for it.

Please see NaturalJoin for context.

CategoryNaming, CategoryDatabase