Web God Object Discussion

Building a flexible WCMS


Continued from TableQuantityVersusAppSize:

[I don't see the duplication the above is intended to fix. It appears to be akin to noting that fields called "Description" show up frequently, (mis)regarding that as duplication, and trying to merge all "Description" data into a single table. That, of course, would be ridiculous. The above seems similar.]

It's not just one column, but sets of columns that are very similar. But again, in accounting we may know that the duplication is not a problem because it hasn't been for decades doing it that way. But every domain is different. I'm currently working on a WCMS-like gizmo that I wish I used a GodTable instead of partition things into "kinds" of content or containers. I didn't want it to look odd to other maintainers, so I did it the old-fashioned way. It turned out to be too inflexible. How the info is displayed varies depending on where it's used. My hard-wiring of categories (tables) is already kicking my ass in terms of change difficulty. I have to write duplicate code to display A as an X and B as an X, whereas if I used a GodTable, everything would already be an X (or viewable as an X) without reinventing the damned wheel for each damned table.

[The apparent need for a GodTable often indicates a real need for type-subtype relationships, further normalisation instead of denormalisation, VIEWs to encapsulate UNION queries, or all three.]

What sub-type? And UNION queries are ugly. A given content object may be a "page" in one context, a "link item" with a synopsis (showing it) and optional thumbnail in another, a simple link title in a list in another, both a link item and a page in another, a medium-sized panel in another, a slide-show item in another, etc. I originally tried to separate the synopsis or title link from the "page", but one manager balked: they wanted "one-stop-shopping" for some users (newbie web authors) and thought separating them was too confusing for their group's target users. (Other authors may opt to separate them, but it became clear that both approaches should be supported.)

[You asked "what sub-type?"... It would appear that everything is a sub-type of something which you yourself called a "content object". As for UNION queries being "ugly", I'm not yet convinced that such an aesthetically-based personal opinion about a certain operator in a certain query language should have any impact on relational schema design.]

The GodTable generally would resemble:

We could use the same table for image "attachments" to web pages and the Synopsis can serve as the image caption.

Of course, we often want to shut some of these off for display. Since a given object may be used in multiple contexts, we don't want to shut them off for data entry also. A warning may be in order, such as "Field X is not recognized by the current context or container". In other words, we need "soft validation".

Why do you think the above is a GodTable? Assuming all attributes are functionally dependent on ID and assuming there are no transitive dependencies, it is in (at least) 3NF and therefore acceptable. My interpretation of GodTable is that it is always < 3NF, having multiple entity types in one table. The above, as a GodTable, might be:

"Entity" is too vague. Originally it appeared to be 3 entities: "page", "container" (section or "box" within a page), and "link item" within sections. But, putting it into practice and receiving new requests (and complaints) showed this classification was too rigid. It's not even hierarchical anymore. User/author appears to still deserve its own entity so far and that's not a problem. (There are other "organizational" tables that are not being discussed here because they are probably too org-specific.)

Experience tests our designs, and parts pass and parts fail. Looking at other WCMS, it seems they got too "entity happy" also, and are too rigid and/or too complex as a result. A WCMS should be a "dynamic graph manager" and not pre-classify things.

Note that there can be multiple authors of a given "article". This would imply a many-to-many table to associate authors to articles. However, in the name of KISS, I'd lean toward a textual free-form "authors" field, perhaps pre-populated with the user's name. Only in an org/group where authorship tracking becomes very specific and critical would such be justified. We may get a few complaints when people get married or unmarried and their name changes, but we have to weigh that against the extra UI steps to have an "associator" screen set.

As far as the definition of GodTable, it's essentially a "content" GodTable, not an "everything" GodTable. It's god of a star system, not the entire universe. Perhaps "stem content object" may be a better name.

You haven't answered my question, which was "why do you think the above is a GodTable? What do you consider to be a GodTable?"

Why do you think the term "entity" is too vague? It's a conventional term used in familiar database design techniques like "entity/relationship diagram".

It is vague in that it does not provide an objective "slot" or dividing algorithm in many cases.

Fortunately, the process of normalisation makes it a non-issue. You don't need to worry about "entities", as long as functional dependencies can be identified. That is normally straightforward.

No, it's not. If you believe it is, then write down the clear and objective algorithm without any vague/ambiguous words, and we'll test it out. And please, no IknowItWhenIseeIt. I'll puke of I see that again. It's rarely helpful. The existing definitions out there have too many wiggle-words like "dependent on" and "related to", which are generally continuous concepts in normal English, not Boolean.

{Why do you keep conflating "defined" with "decidable"?}

Decidable by subjective factors? That's not the ideal situation by far, especially when trying to form good design guidelines. I tend toward the largest table(s) row-wise and column-wise without creating data duplication, or at least balancing to get the "the least evil" of two or more duplication paths, and without excessive empty/null cells. Of course, "excessive" will probably be a point of contention. In a table like the above, there may be a lot of empty cells. However, if they don't form a consistent pattern, then it may be fine to leave them blank.

A consistent pattern would be something like a group of columns that are always blank when a certain condition is true. If we can identify such, then we may have a candidate for an "entity split". For example, if there are say 5 columns devoted to images above, and these columns are always empty if there is no image for a web content object, then we can consider creating a separate entity "Images". But if usage of images always matches to the object rather than having some or all images independent of web content objects, then it may be fine to not split. I'd lean toward "don't split unless you see a likely need now or the near future" because it would probably lead to:

An exception may be if images are rare compared to other content; then the carrying costs of all those empty cells is not worth it in terms of mind and machine space.

{Your response doesn't appear to have much to do with what you responded to.}

Ask better questions.

That looked like a very good question. Why do you keep conflating "defined" with "decidable"?

"Decidable" says nothing of the quality of the answer. Unless, you mean you "know" the answer is "determinable" in an objective way, but cannot put the determination technique down on paper (without vague steps/parts) for public inspection. In that case it's equivalent to "undefined" for practical purposes and is ArgumentFromAuthority.

That doesn't answer the question.

Sorry, I tried my best. I guess I don't know you are getting at. You write and think oddly. It should be simple:

  if there are clear-cut objective rules then
    convert the rules to a computer-executable algorithm (or close to)
    and show us it
  else
    admit there is are not clear-cut objective rules
  endif
You appear to erroneously equate "can be decided with human reason" with "can be identified using current computer technology".

That sounds like a geeky attempt to legitimize ArgumentFromAuthority. "The proper algorithm is in my head, and it just happens you can't see/test most of it."

No, it's simply reasonable. I can trivially identify a cat when I see one. We can easily agree that we've both seen a cat. It's very, very difficult to get a computer program to reliably visually identify cats, and even more difficult if not impossible to define an algorithm to visually identify a cat.

I'm sure there are many instances of possible cat-like animal configurations that would generate different answers among us. That something can be called a domain "requirement" may likely be at dispute, for example, in a chaotic office where different people are giving different implications or the lines of authority overlap and contradict. I will agree that if we had perfect knowledge of the domain, including likely change patterns, that the disagreements over the application of normalization would be far less common.


See also: MultiParadigmDatabase, GodTable, FileSystemAlternatives


CategoryDatabase, CategoryWebDesign

JulyThirteen


EditText of this page (last edited December 16, 2013) or FindPage with title or text search