Relational Database Field Names

[discussion moved from EveryWordCanBeAbbreviatedToFourLetters]

{Why not call this topic DatabaseFieldNames?? I see nothing specific to relational yet.}

Database column names are often best constructed using groups of three or four words. If the words are kept short, the column names should not get too long.

But, at a high cost to readability. Which may or may not be a good tradeoff.

Full words also have a high cost to readability. If four 4 character words are used to make up a column name that is only 19 characters with underscores. A human eye can see that much at one go. If four 8 character words are used to make up a column name that is 37 characters. This is very difficult to see all at once. That means that you have to read the name rather than seeing it, and that also means that you can generally only fit one or two column names on a line rather than 3 to 5. This means a number of thing:

If you name your variables with names similar to the column names, you take up an entire line of code: 37 + 37 + 3 = 77 characters for a simple assignment statement. This does not even include margins.
This means that you have to remember what was at the beginning of the name as you scan through it taking up more mental energy I would rather spend some other way.
This means that many editors, printers, API's will not adequately handle the code.
This means that people who make column names will be tempted to use fewer words to create column names thus making them less precise in their menaning and therefore making their use harder to understand.

The upshot is that using full words means that focusing on an individual piece of code will be harder, and if a good set of abbreviations can be developed, we avoid all of these problems. It is, I admit, a tradeoff. Full words are easier to read by themselves. I would like to see an AbbreviationStandard? that gets around some of these problems.

You make some interesting points. MeaningfulNames has a few counterpoints. I assume the attempt to list some abbreviations is work towards a standard?

Yes, see above

I think it is a fallacy that variables and schema column names have to be fully self-documenting, especially the more commonly used ones. The database should have a DataDictionary that allows for a fuller description of the column that any developer can reference. Unfortunately, such is not standardized. But, nothing stops you from making your own. Similarly, variables should have a description comment next to them if you cannot find a concise name.

Instead of a special purpose DataDictionary, perhaps a general purpose HyperlinkedTextEditor would do the trick. With this editor you could browse around until you found where the relevant comments/documentation were, and then back again to work on the code.

As names for common concepts become more agreed upon, will they get shortened further? For instance FirstName? or CountryCode? are more or less ubiquitous could they and similar fields/attribute names become shorter to the point of being fn, cc or similar as database/object computing theory matures in the future? I ask this since if you compare mathematical/scientific notation (within context) practitioneers agree upon F as force, c as the speed of light, v as velocity, d(y)/dx as differetiation etc in shared technical discussion though these are highly abreviated.

I don't think database column names cleave into discrete contexts as well as mathematical/scientific symbols. There are too many diverse interests looking at the same names.

This is all silly. Names should always be full words. Any name that is too long as full words, points to a problem in your design, factor it better, you shouldn't be using multiple(more than three) words that often. Abbreviations are used by bad programmers.

I think it is funny that the person advocating for always using full words shortens "should not" to "shouldn't".

To be fair, "shouldn't" doesn't add any ambiguity to the word that I know of. But it does require a reasonable knowledge of English abbreviation rules to read. (I'm against "long field names", by the way.)

The problem with "Names should always be full words" is that it is not, in itself, a convention. This leads to an unfortunate diversity in name building. I might call something "Walk_Dog_In_Park" and the next guy wants to name it "Park_Walks_For_Dog" or "Park_Oriented_Dog_Ambulation" or "Dog_Park_Walk".

Later, when I look at his code (or he at mine), I may wonder what he could possibly mean -- since he and I string our words together differently.

If you have a convention that dictates how names are organized, the "full word" constraint becomes less important. In fact, mnemonics can be quite useful for frequently used terms.

Long names versus short names seems to be one of those never-ending HolyWars because it is probably a personal preference. I personally do not like long names, as long as they are explained somewhere. It would be nice if DBA's kept a list of abbreviations around for all to reference. For example, have a list such as:

Schd - Schedule Rte - Route Mth - Month Cd - Code Dt - Date Invc - Invoice Acct - Account Tm - Time Etc....

What happens when you write software for some other domain such as accounting, telephony, workforce management, customer management, networking, vetinary management ... Do you create a lookup table for each new abbreviation you create in that field?

If so you are creating a pile of extra work for every maintenance programmer who comes after you. It takes extra mental processing to remember 'Cd = Code' and 'Invc = Invoice'. Not to mention the different ways people abbreviate (e.g. 'Inv' is more common at my workplace).

Just use full words. Literary scholars throughout the centuries have resisted the urge to abbreviate every 5th word in the name of easy typing. They also seem to think it doesn't make their text any clearer. I agree there is a very large exception to that in the science and mathematics communities who use single letter abbreviations all over the place. But their formulas were often written on chalkboards without any intellisense. They have also gone on to use greek letters because their domains contained more than 26 concepts.

I know I would rather see:

Energy = Mass * SpeedOfLight ^ 2

than

e = m * c ^ 2

if I had a limited understanding of that domain (ala most maintenance programmers).

For a simple one-off usage, I generally agree. But lots of verbose formulas/expressions can be difficult to read, especially when they span multiple lines, at least for me. The more a given variable is referenced, the more abbreviating helps. Put a longer description or comment at the point of declaration, or the column note section if the DB has such.

And your example is an exaggeration. A more realistic example would resemble:

  Engy = Mass * spLight ^ 2

This example is not complicated enough to demonstrate the tradeoffs either way, though. For longer or complex expressions, the structure (shape) of most expressions is easier to grok if it's relatively compact, at least for my own WetWare. Long names can stretch out such expressions too much, including causing line-wrapping. The nesting, parameter divisions, positions, etc. are easier to spot when expressions are compact.

 // Example A
 foo(bar(flib, brot), x, moof(zip, bop, grit), mip + sid); 
 // Example B
 fooFlamFopper(bartonic(flibTrammoof, brotonix), exxySimmo, moofopitanon(zipPamNoffGripp, boppityDoo, gritMatterTad), mippotronic + sidSadDodSodMod);

"A" is far easier for me to grok the structure of in terms of placement and nesting. Maybe YOU have FastEyes and "B" is almost as visually digestible to you as "A". But if so, you are probably not the norm. I suppose my visual mechanisms may be below average for developers, but I doubt the average is much higher such that most would agree that "A" is noticeably easier to visually analyze than "B". The fact that the common math convention is closer to "A" lends credence to this estimate so that I don't think I am an outlier.

I know what I like and I know what confuses me and I reject the notion that YOU know my mind better than me and know what confuses me better than I do. That being said, I'll respect your preference as your preference and realize that no coding style will make everyone happy. Naming is a place WherePsychologyMatters (including biology differences). (This issue came up in another topic. If I find it again, I'll relink it.)

Existing RDBMS do need better or more standardized DataDictionary(s) such that long descriptions can be given in the dictionary. That way the formal column name does not have to carry the entire burden of thoroughly describing the column. If the RDBMS does not provide one, then making your own and placing it in the application, project, or company table namespace is not that difficult with typical tools.

Good naming is an art that seeks to balance brevity against self-documentation. It takes skill, experience, thought, and patience to do it well. I marvel (and learn from) those who get it right and cuss those who bush it up.

e = mc^2 was the first example that came to my head when I thought of single letter names, so it may have been a bad example. But your counter-example of using 'Engy' and 'sLight' just makes me ask 'why bother'? Accepting the fact that all IT professionals should be touch typing at a reasonable speed, it isn't that hard to type the extra few characters per word to save the hassles of:

- Abbreviations that differ from person to person, causing them to check which version was used when referring to the object (e.g. 'Invc' vs 'Inv').

- Forcing future readers to mentally convert the abbreviations each time they are read (or until they are part of muscle memory).

- Causing a delay for readers for cases where the abbreviated word is ambiguous (e.g. 'Cd', 'Dt', 'Tm')

If typing is not the issue then the only other complaint I can find is that full words are harder to read then shortened ones. Again I will point to the fact that 99.9% of English literature uses full words. I'm sure that if people had a hard time reading full words they would've let the authors know a few centuries ago.

Well, why bother using array[i] when one could use array[iterator] or array[index]. Often shortforms can be used to clean up code and make it easier to read, instead of cluttering it with rubbish all over. One of the problems with English is that it is verbose - programming is not English. -AnonymousDonor
[Indeed, array[i] in a tight loop is so idiomatic that it's probably the most readable form, assuming an appropriate 'for each' or similar construct isn't available. However, the goal should always be enhancing readability for your colleagues and successors; as such, pet abbreviations should be avoided for the reasons mentioned above. Programming isn't English, that's true, but ideally it should be as readable as good English writing can be. Wthr Eng or code, lts of abbr r hrd 2 read & may cause unint ambig or conf.]
English often does use abbreviations and pronouns to keep sentences fairly compact. I'd go bats if somebody spelled out the full name for "NASA" 30 times in a space article instead of the abbreviation. --top
[Idiomatic, universally-recognised abbreviations are fine. Ad hoc abbreviations are a problem.]
Even if NASA was new or unknown, personally I'd still like the abbreviation used in the rest of the body (after being spelled out on first reference). My WetWare is simply distracted and my train-of-thought derailed by excessive name-length. Your mileage may differ. I know what bothers my brain the most and try to balance things for the least mistakes or problems.

Perhaps it's because the words are joined together using PascalCase . Long phrases without any spaces appear to be hard to read for some people, compared to abbreviated forms. While I can accept that some people feel this way, it doesn't hold that 'SpeedOfLight' should be shortened as it's a very short phrase. And if that is the only objection (that long phrases without spaces are hard to read) I would hope those who are pro-abbreviation would agree to avoid abbreviation if we ever evolve to a language that doesn't need PascalCase (e.g. a graphical language where each object on the screen has a label floating above it written with spaces between words.).

Hmmm.... SpeedOfLight should be shortened to sol. No ambiguity at all! More seriously, PascalCase statements like SpeedOfLight are only initially difficult for people to read. Brains are amazing pattern-crunching engines, though, and become used to it quickly enough. German words are sort of mushed together that way all the time. I can't support it with hard numbers or anything other than my own experience, but I expect that speed-readers would quickly come to identify WikiWords as wholistic concepts in big chunks as opposed to reading each English word individually, and at the same time they'll benefit from the combinatorial disambiguation provided by word combination - i.e. their brain will overall be performing less work attempting to figure out, for example, which adjectives and which prepositional phrases are modifying which nouns. WikiWordsAreConcepts.
One solution for people who don't enjoy long PascalCase phrases might be for the compiler to use a special font for identifiers, where there is a small space before each capitalized letter. I'd prefer to go down that path of fixing it through IDE technology rather than using abbreviated forms that might make reading more difficult. I wonder how hard such a plugin would be for say Visual Studio with C#?

In response to your last point regarding brevity vs self-documentation, I would argue that communicating the full intent of an object is the most important thing. Development is complex and difficult, the last thing you want to do is cause a reader to do extra mental processing, such as converting abbreviations and looking in data dictionaries. Communicating the full meaning of an object is more important than saving screen real eatate, except for very rare cases where English fails to describe an object within 8 words or so.

Bloat and verbosity also hinders communication, and data-dictionaries are not evil. Enough of this debate already. Everybody has their own personal preferences; I like my preferences and you like yours. Unless you have something really novel to say on this subject, let's AgreeToDisagree and move on.

CategoryDatabase, CategoryNaming