Thin Versus Wide Table Definition

This is an attempt to make a working definition of "thin" versus "wide" tables, as discussed in FearOfAddingTables. This topic should not attempt to attach a value judgment to each practice, since there are already topics on that.


Candidates:


A good definition should be unambiguously, and ideally mechanistically, identifiable. Does the above qualify?

Agreed, but there will be disagreements over terms like "unnecessary" and "key". In my experience, our design assumptions are often based on what will change in the future. We may add some duplication to a design to prepare for certain future changes, for example. These are difficult to quantify in an objective way because they are based on past experience, which is probably not documented in detail, or may not be a big enough sample size to be "statistically kosher". Also, what is a "key" and what is data may be debatable. "CA" in a US-State table could be seen as data (an abbreviation) or as a key, or both. Trying to define everything perfectly will just lead to endless fractal LaynesLaw recursion. -t

Actually, "key" has a very clear and formal definition. It's the rest that I found unclear.

You mean "unnecessary"?

{That'd be a start. Also problematic: your use of the words 'duplicate' and 'data'.}

I meant that the definitions of the normal forms have vague words in them; words that require domain interpretation.

[I find that to be a surprising assertion. Could you provide an example?]

To avoid risk of source bias accusations, please paste in your favorite definition for 5th or 6th form, and I'll take it from there.

[The ones on WikiPedia are adequate, and more importantly, easily accessible to everyone and can be referenced without pasting.]


Problems with candidates:

candidate A:

candidate B:

candidate C:


I vote for candidate C. 6NF is a formal description of the "thin table" mantra: never combine in a table that which can be reproduced by a view.

Making every table be just two columns, key and value, would "satisfy" it then. (Sometimes "key" will be compound).

Are you assuming that all relationships in a domain model are binary?

To be Clintonian again, I have a problem with "are" here. I'm hesitant to say that there is any "hard" coupling between any two candidate columns (or sub-columns) in the domain. The relationship in practice is dynamic, situational, relative, and may change over time. Mentally, I view the associations as probabilistic, not Boolean. It's roughly comparable to the "mechanical" view of physics versus the quantum view. Even with dates, I've seen apps that only want to use the month portion, for example. Thus, in that app's domain view, the day doesn't exist. What we seek in practice is the most UsefulLie. --top


Formulaic approach:

  Score = (wn * N) + (wk * K)

Where N = Null duplication K = Key duplication wn = weight of null duplication wk = weight of key duplication

In the "thin" table view, "wn" tends to be high and "wk" tend to low, while vise versa for "wide" tables. The closer to zero the result is, the more it fits that viewpoint.

Another way to look at this is to plot it out:

  N|Wide..........
  u|..............
  l|..............
  l|..............
  -|..............
  d|..............
  u|..............
  p|..........Thin
  .+--------------
  .....Key-Dup....

(Dots added to prevent TabMunging)

Thus, if we have lots of "Null duplication" but very little key duplication, then the table is considered "wide", or leans heavily toward "wideness". If we have lots of key duplication but very little "null duplication", then we are heavily in the "thin" camp.

I don't know of anybody proposing the upper right (except very sloppy designers), and I don't think the lower left is technically possible in most cases. Thus, in practice, designs (for designers who care) tend to be somewhere on a roughly strait line drawn between "Wide" and "Thin" on that chart.

  N|Wide..........
  u|..\...........
  l|....X.........
  l|......\.......
  -|.......P......
  d|.........\....
  u|...........\..
  p|..........Thin
  .+--------------
  .....Key-Dup....

"X" marks what I see in practice (ignoring bad data-duplication, which is not contentious here.) "P" marks where OOP proponents tend to be in my observation.

--top


See: NormalizationRepetitionAndFlexibility


EditText of this page (last edited May 10, 2010) or FindPage with title or text search