Table Quantity Versus App Size

Continued from AccessBasic, and (presumably) based on the sample database schema in MicrosoftAccess format at http://www.armchair.mb.ca/~dave/Examples/EDM.accdb

For one, it looks like it's using much of the "thin table" design philosophy. I've been round and round with another WikiZen on that issue and don't want to reinvent that debate again today.

Tables like "A/P Transaction Type Descriptions", "A/R Transaction Type Descriptions", "Scheduled Skills", and perhaps others could probably be factored into a single ConstantTable.

[At the expense of complicating queries, of course. Merging them would gain what, exactly?]

I spotted other duplication such "A/P Journal..." and "A/R Journal..." table sets that perhaps could be combined with sub-type indicator(s). It may make reporting more re-usable because the reports wouldn't need different queries other than the sub-type indicator predicate and title parameters.

[It doesn't make reporting more re-usable. A/P and A/R journals share structure, but they have no other commonality. A/R is not simply A/P in reverse, or vice versa.]

"G/L Journal Credits" and "G/L Journal Debits" are also probably factorable together on a sub-type indicator(s). In general, all of the "X/X Journal..." tables look like good factoring candidates, even if some columns go un-used for some sub-types. (Existence of unused columns are one of the big sticking points in the thick-vs-thin table debate.)

I'm sure there are other areas of possible consolidation.

[To gain what?]

Less tables

[In return, you gain added complexity everywhere else. That seems unreasonable; it's akin to mediocre C programmers who try pack as much functionality as possible into each line. Sure, their code has "less lines", but at what cost?]

Part of design is selecting the right ratios for units of stuff. Of course, there are disagreements over the ideal ratios.

[The "right ratio" in database design is that which minimises both complexity and update anomalies. Your suggestions do not reduce complexity; they increase it. You decrease only a table count. How is that reducing complexity?]

That depends largely on the future patterns of change.

[Assume that "future patterns of change" are not predictable.]

Then just about ANY design can be a net benefit under at least some of the eventual paths.

[That's a curious assertion. Can you provide evidence for it?]

No, and you can't provide evidence for the opposite. Thus, the future and what designs it favors is "unknown". In practice we use experience and intuition to make estimations about future changes because time machines have not been invented yet (except for the purpose of changing birth announcements in Hawaii).

[Or, we use normalisation to minimise duplication and update anomalies, and thereby achieve minimal complexity.]

By the way, if the designers used my "fooRef" convention for foreign keys instead of "fooID", the schemas would be far easier to read. It's a good convention.

[I'll be sure to hop in my time machine, return to 1991 when this project was started, and get the team to consult with you about naming conventions.]

It's also possibly multiple applications mixed into one: 1) staff scheduling, 2) billing, and 3) accounting. And there is oddballs such as "Sched Client Medication" which suggest a patient treatment tracker app. Nothing wrong with integration, or at least semi-integration, but I'd still probably classify it as multiple integrated apps rather than one big app.

[It incorporates multiple business functions, but it underpinned a single integrated small-business management application for home health care delivery. It was an explicit requirement that there be strong, seamless integration across business functions whilst permitting each function to be expanded or customised.]

I will agree that if you want each app separable from the rest, then some of the structural duplication mentioned above may indeed be warranted. But this depends on the scope of what we are calling an "application". If you consider it one big app, then factoring between apps will reduce the total table count. It's a matter of whether we factor for a "sub-app" scope or an integrated suite scope.

[What do we gain by reducing "the total table count", especially at the expense of more complicated queries and constraints?]

A lot of the many-to-many tables could also be replaced by a single table like this:

 crossReferences (table)
 ----------
 crossType
 key1
 key2
 value
 timeStamp
 note
I realize such is unconventional and that most current RDBMS don't support them very conveniently for referential integrity, but it is a way to reduce table-space clutter. It's vendors that don't favor such, not the inherent logic of the universe. But, that's mostly a side-note. -t

You've just made me very happy that I don't have to use any schemas designed by you. The fact you think it's a good idea to force extra complexity on the users of the schema in exchange for more data integrity issues is simply mind-boggling.

How are you objectively measuring this "extra"? You should have anticipated this question by now and stay silent until you have an answer because nobody should just take your word for it. It would be an interesting issue/observation if the laws of the universe forced one to massively violate OnceAndOnlyOnce to satisfy referential integrity, but so far you haven't demonstrated this.

The most obvious extra complexity is in the where clause. With your schema I now have to add an additional filter in order to remove the rows belonging to the other tables you crammed into it. Natural joins turn into self joins and I have to add table aliases in addition to the additional filters in the where clause. All this just for a lower table count, something of rather dubious benefit.

I've argued elsewhere that the way existing RDBMS implement natural joins is flawed. I'll agree that we sometimes have to do shitty things to work around shitty standards/conventions, but it's not inherent to the general relational model.

It's not just natural joins, every query needs the extra clause(s).

 select * from role
becomes

 select * from foo where subtype = 'role'

You've already acknowledged the increased complexity in the integrity constraints.

Because existing RDBMS value performance over bloat reduction.

Nope. It's complexity that's added by your approach. You have to explicitly tell the database which portion of the table to apply the constraints to. It has no way of knowing otherwise. With the current approach, the constraints apply to the entire table, and the database knows that already.

In addition, you've upped my storage costs. Every index on the tables must now be prefixed by what is essentially the table id. I also have to index rows that don't need to be indexed, because I can't index on just part of the table. Every row in the table has an extra id.

Should we organize our tools around WetWare efficiency or machine efficiency? All things being equal, OnceAndOnlyOnce should hold. If something gets in the way of that, then we should look at it closer to see if repetition is a necessary evil or due to poor root tools. Under the hood the different "sub-types" could be implemented as a separate physical table. Schemas are only a logical model, not necessarily a physical one.

Perhaps something akin to TableInheritance in order. If there are commonalities among tables, it would be nice that our model both designates the commonality and takes advantage of it. A big pile of tables is just as annoying as a big pile of files without some kind of sub-division or meta info to group/classify them.

I will agree that existing products favor your approach. They assume certain design styles and are optimized for that; and if you go with the flow of a tool's assumptions about usage patterns, things usually go smoother. But it does create cluttery repetition.

You haven't reduced mental bloat either, since I have to add extra mental steps when I use or reason about your table.

Why is that? Trace the steps. Table name clutter slows me down. Maybe you have FastEyes, I don't, and many others don't. We want table-space lean.

[You want "table-space lean". Speak for yourself.]

Everybody's WetWare is different. And, we should strive to respect OnceAndOnlyOnce. It may be a tool smell when we have to repeatedly violate it. You shouldn't be happy with the current state of things and try to understand if the "logic of the universe" dictates repetitious schemas are a necessary evil, or whether it's something we've been doing wrong in or with our DB technology. I advise all to ponder this duplication and not settle out of habit. -t

[Beware of conflating similarity with duplication. Bookkeeping systems (which this schema supports) are full of similarity that is not duplication.]

There is no iron-clad boundary between the two.

[Speaking as a former bookkeeper and accounting department manager, custom business application developer, payroll specialist, and database developer, the schema presented here demonstrates appropriate handling of duplication vs similarity. Some of the table merges suggested here (e.g., merging A/P and A/R journals) would be inappropriate.]

You mean double-entry-bookeeping? You are doing it wrong. I modernized and solved it at AccountingModeling. Give me Nobel now! -ArrogantTop

[See the counter-argument to AccountingModeling at the bottom of DoubleEntryBookkeeping.]


I remember one org's schema that had bunches of tables. To protect the guilty, I'll call it an "insurance" app as an approximation. It tracked cars, boats, guns, and other expensive items. They made a table set for each physical object kind (cars, boats, guns, etc.) when they could have consolidated them into something like this:

 objectModels (table)
 -------------
 modelID
 objType  // ex: vehical
 majorModel  // ex: Ford
 subModel  // ex: Escort
 yearFrom
 yearTo
 notes

objects (table) ----------- objectID modelRef features // ex: 4-door [could also have feature table] color condition serialNo licenseNo estPrice yearFrom yearTo // sometimes a range given when exact unknown notes

--top


Example "Norf":

    1. Your approach:
      SELECT * FROM XX_YY WHERE aspectC='foo' AND aspectD=7

2. My recommended approach with traditional RDBMS (regular SQL): SELECT * FROM XX WHERE aspectB='YY' AND aspectC="foo" AND aspectD=7

3. Single dynamic table approach (pseudocode): SELECT * WHERE aspectA='XX' AND aspectB='YY' AND aspectC='foo' AND aspectD=7

4. Single dynamic table approach with a view-like shortcut/macro (pseudocode): $view_xx_yy = "(aspectA='XX' AND aspectB='YY')" ... SELECT * WHERE $view_xx_yy AND aspectC='foo' AND aspectD=7 // see how it's more meta/dynamic than traditional RDBMS? // Similarly, we could "compute" the column expression list if desired.
--top

No, since it isn't.

You are just defending your syntactically- and idiom-quantity-bloated TutorialDee implementation.

[I think you've confused me -- the guy who wrote the *-bloated TutorialDee implementation -- with someone else.]

All you non-handled people look alike.

[Focus on the content and not the creator, then it won't be a problem.]

Then I lose an opportunity to trash T.D. ;-)

I was just pointing out that there doesn't appear to be anything useful it does that couldn't be done just as easily with more traditional table structure. (The text macros are okay, but I'd rather have user defined filters, functions, and the standard views).

That's because you like ADA-like bloated syntax and multiple syntactic constructs that repeatedly define a crippled subset of DatabaseVerbs on each and every language object type. I'll take more cues from Lisp and TCL than ADA, Thank You.

For example, I don't have to have a dedicated DB object called a "view". And I can store "views" in tables (just like EVAL again, ha ha) and use the existing table editors/tools to manage the CRUD aspects of them and invent any meta data or classification system I want on them; not just Larry Ellison's vision of what view management should look like. -t

[From where can I download, and then run, this magical language of yours?]

In any case, you can do all that with the more traditional table structures as well. Traditional views have been stored in tables for ages. You can create tables to hold whatever meta data you like. See, still no advantage to your proposed system.

But RDBMS objects have their own special syntax for each different object kind, bloating up the DB language and preventing cross-use or mixed-use. For example, for the static kind of tables, I would rather store table layouts (schemas) entirely in tables and wouldn't have any special "CREATE TABLE..." commands: you just fill in the DataDictionary-like tables (and perhaps issue a Refresh command, depending on architecture). If you wanted to script table creation, then just script filling in the dictionary tables LIKE YOU WOULD ANY OTHER TABLE (AKA data "INSERT" commands). IdiomReuse?. There would only be update-queries, no need for a special and separate data-definition language.

I suppose you could argue that since the DB vendors already created all that bloat, it's already done and one should just use it. But for newbie's, its still usually easier to learn a few idioms and re-apply them to what would be different kinds of "database objects" in the traditional RDBMS.

Re: "You can create tables to hold whatever meta data you like"

But it requires a join to use, and one may forget where the meta-info table is. It's also a violation of OnceAndOnlyOnce since the vendor has a row for a say a view, and we now have a kind of MirrorModel row to put in your own stuff.

Why would it require a join? Certainly there will be joins in some cases, but that also happens in your proposal, so no loss there. In general, you'll actually have to repeat stuff more often the way you want to do it, so again no loss there. (E.g., you repeated the table name for each row when you combined it with other tables.)

It's a 1-to-1 table. Normally when you see a 1:1 table you factor them together, but most vendors don't let you do that for the reserved objects. But because it's not a hard-wired concept in my proposal, you can stick any attribute on any "table" you please.

No, I usually only combine 1-to-1 tables when I'm almost always joining both tables in my queries. Otherwise, I prefer to have them separated. You can also add views of them combined if you want. (Which allows you to see your meta-data as if it was part of the same table even for the special tables.)

If this is back to the ol' "thin table" debate, then I'm bailing. Thin tables just don't fit my WetWare and appear to produce what I'd call "more complexity" but have not found an objective and universally-accepted way to measure "complexity" of schemas/table-design. -t

Also note that your criteria as stated appears to be based on actual usage patterns. Is that your primary criteria for combine/separate decisions?

Yes, we've noted that your WetWare causes you to prefer adding complexity in many places to reduce complexity in one place. Hardly something to emulate.

You want a complex DBMS by hard-wiring things that could be meta-tized. But I have agreed in other non-table topics that fitting WetWare sometimes trumps simplicity. Simplicity is moot if the staff doesn't know how to use it effectively.

When meta-tizing introduces issues (and in this case you've already agreed that it does) and no advantages can be found, then yes, I'm against it.

I already told you one advantage: simpler DBMS architecture.

That's the first time you've mentioned that. How would it be simpler?

See PageAnchor idiom01. Also search for "LIKE YOU WOULD ANY OTHER TABLE"

I see nothing there that shows the DBMS architecture being made simpler.

Well, I don't know at this point how better to describe it. It's pretty obvious to me, but I guess I am failing to communicate that. A rough analogy is the difference between ADA and Lisp: ADA hard-wires "block types" (conditionals, loops, etc.) with special hard-wired syntax for each of them, where-as in Lisp they are not hard-wired into the language, but merely convenient library functions that can be ignored such that one can just as well roll-their-own block structure/operation. For this reason, Lisp is a "simpler language" or "architecture" than ADA. The definition of Lisp would take up one or few pages (depending on the dialect), but ADA's is approximately 100 pages.

Similarly to the block analogy, the MPDB approach does not hard-wire in the concepts of "table" and "view".

[What's an MPDB?]

See MultiParadigmDatabase.

[Ah. Imaginary stuff.]

So? We have to explore imaginary stuff to check if "design rules" apply only to existing RDBMS or all possible RDBMS/DBMS. If you know a better way to check besides "trust my judgement", I'd be happy to consider it. DontComplainWithoutAlternatives. Every single fucking existing system started life as "imaginary stuff".

And it can't hurt to ask what DB's would look like without (explicit) tables, can it? If you can prove it's universally objectively worse, fine, we'll dump the idea.


Note that I wouldn't recommend a dynamic GodTable for accounting applications because one has to be fastidious when dealing with money. But other apps/niches may be more forgiving, and nimbleness of change is valued over preventing loss or damage to data. ItDepends on the domain/niche. --top

[Deprecating normalisation is never a good idea, except in the analytical settings that demand it purely for performance reasons. Of course, they are not interactively updated, so update anomalies are not a problem.]

So you say.

[Do you denormalise for reasons other than performance?]

Yes, WetWare. Note that I don't normally recommend duplicating actual data (values), thus the level of "normalization" may be an issue.

[Do you believe WetWare takes precedence over the risk and/or complexity of dealing with redundancy and update anomalies?]

Many "update anomalies" are a matter of tradeoffs between kinds of "anomalies" and future change patterns.

[No, they're not. They're a specific technical consequence of failure to normalise, independent of any "tradeoffs between kinds of 'anomalies' and future change patterns". See http://en.wikipedia.org/wiki/Database_normalization]

We may also disagree on what we are calling "redundancy". But in general, YES, WetWare often takes precedence over ArgumentByElegance if grokkability is a more pivot-able issue than "fewer parts" or even machine-enforced "safety". We've had similar debates in and around GreatLispWar. I engineer for "users" of the code, not so much for some ideal universal purity. Neither God nor the universe reads and reacts to our code (in a detectable way), only humans do.

[This isn't about ArgumentByElegance, but about the pragmatic effects of failure to normalise. It has nothing to do with how "God [or] the universe reads and reacts to our code" and everything to do with the simplicity of update/select queries and the schema. In a properly normalised schema, each table/RelVar has a one-to-one correspondence with a real-world entity type (and vice versa). That makes the schema more intuitive and understandable than one where tables hold multiple entity types, because for each real-world entity, there will be one and only one table that holds its information. Thus, a user looking for an entity without already knowing its table need only review the table names, rather than having to examine tables and their structure and their contents. Proper normalisation also makes the schema easier to modify and/or extend, because there is strong SeparationOfConcerns between tables/RelVars.]

In this case there is no real world "thing" that clearly maps to our data world. One could argue that in the old days one buys a generic paper "journal", and whether it's A/P or A/R etc. is a sub-category, such as a tab marker. And they may physically group them by month, but that shouldn't imply our tables are split monthly just because that was the "real world".

In fact, I'd argue that most "intellectual property" (including monetary credits and debts) has no "real world" counterpart because intellectual property is "in the head". We have old tablets and paper representations to study, but these were often organized for "processing/handling convenience" using the technology of the time. We have more options in the computer than they had back then. One of the most flexible techniques we have now is SetTheory where each aspect can be modeled as a set, and sets can potentially overlap or not overlap based on dynamic (or easy-to-alter) expressions. To keep flexible, we shouldn't give one set special or limited treatment compared to another, at least not hard-wire that treatment into the system, which is exactly what the "sin" of table-ness (above) does. (Yes, Mr. Top is questioning the "T" in T.O.P., at least for experimental purposes. Or at least redefining/expanding the meaning of "table".) --top

Still attempting to find something that will stick? There's nothing dynamic about set theory. You cannot alter whether or not one set overlaps with another.

I'm talking about computerized implementations. If you treat most aspects as sets and all sets are "created equal", then it's generally easier to alter membership, or at least use uniform tools and knowledge to manipulate them.

Those types of sets aren't based on "easy to alter expressions" either. They use the same basic approach that a table would use with regard to membership.

There would be no diff between tables and views, and you'd rarely need UNION.

Please explain.

     SELECT * FROM A UNION SELECT * FROM B UNION SELECT * FROM C

SELECT * WHERE myAspect IN ('A','B','C') // N.T.

[Top, are you proposing something like SetlLanguage, or something like ExtendedSetTheory?]

No. It's a map of maps, not lists, and not (explicit) expressions. Most of the SQL-like idioms we know and love (and hate) can still be used, except one usually uses row filters instead of "tables".

[What would we gain by that?]

Didn't you ask that already? We don't need tables, so we toss them use other existing idioms instead: idiom reduction. I'm not saying it would improve things for every domain, but could for some.

[Under your approach, how do you insert into tables?]

You insert a row into the "master" table (for lack of a better name) and don't need to specify a table. (A given shop can opt to have a "required" attribute such as "table" or "entity" to enforce table-like membership.)

[Could you illustrate with an example?]

And one of the benefits is that we don't have to have duplicate schemas for things like A/P versus A/R versus Billing journal entries, per above. The enforcement of validation can be "inter-table" in the traditional sense. Journal-ness and app-ness (A/P, A/R, Billing, etc.) are generally independent aspects and we can tie journal-ness validation to the journal-ness aspect without having to tie it to the app-ness aspect.

[Could you illustrate with an example?]

Okay, assume we programmed the dynamic Map-Squared-A-Tron to use the following data dictionary table (DD) for "insert/update" triggers, AKA "validation".

 dataDict (table)
 ----------------
 ID
 fldName
 descript
 type  // number, integer, string, date/time, boolean, etc.
 minSize   // use 1 or greater for "required"
 maxSize   // blank = open-ended
 etc       // etc.
 groupsMatch  // Example: "journal, ap, ar, gl, bill"

The "journals" for A/P, A/R, G/L, and Billing share various fields in common. Let's say a new data row is added to the main (only) table. If the "groupsMatch" attribute for a DD row has "journal, ap" (ap=A/P), and the new data row has at least "journal" and "ap" in its "groups" attribute, then it will be "triggered" for validation inspection by the data dictionary "table" for the corresponding columns. We don't have to break the sharing pattern into a hierarchy, since the sample given (MS-Access) does not have commonality that can be readily partitioned that way for "journals".

Notes: "table" is used loosely here in that there may be a convention to have a "table" attribute if one wants to manage a given set of rows the old-fashioned way. Also, if this is custom-programmed it probably wouldn't be very efficient such that some kind of native hard-wiring of such a DD convention may be necessary for performance reasons. But focus on the conceptual side of things for now. Similarly, the "string list" approach is poor normalization such that a many-to-many table may be warranted. But this is to illustrate dynamicness to facilitate factoring out the column commonalities for "journal-ish" tables mentioned above. And dynamicness may not be appropriate for accounting apps because an org typically values accuracy over flexibility when dealing with money.

Finding a good primary key set for a DD used this way is a bit tricky. This is because we've gone away from a hierarchy (table -> column) into set-land, where uniqueness can be more difficult for humans to tame or visualize. That's the price of power. In this particular case, perhaps we can set up a validation rule where "fldName" is unique within all "journals" DD rows, per "groupsMatch". -t

[Sorry, I don't understand what you're trying to demonstrate. Perhaps if you presented a side-by-side comparison of the "traditional" data modelling approach (as demonstrated in the original database linked above) vs your approach -- perhaps using both conventional SQL and your hypothetical table-less SQL -- then you could clearly highlight the benefits and features of your approach. Thus far, I find them unclear.]

It's not really "table-less", it's "dynamic" tables.

As far as the big picture, what the example facilitates is a mapping between fields (shown as A to E) and tables (or sub-tables) that start with "t" below such that column info is readily reusable rather than repeat things like the "journal" structure, which the MS-Access sample does. We are fixing a violation of OnceAndOnlyOnce by using indirection (references) to columns instead of the traditional copy-and-nest. Sub-tables t1 and t3 share the column specification for column "D" here.

 Col:A|B|C|D|E
 -------------
 t1 |Y|N|N|Y|N
 t2 |N|Y|Y|N|N
 t3 |Y|Y|Y|Y|N
 t4 |N|Y|Y|N|Y
 Etc...

[What problem does this solve?]

Fixes OnceAndOnlyOnce.

[I don't see the duplication the above is intended to fix. It appears to be akin to noting that fields called "Description" show up frequently, (mis)regarding that as duplication, and trying to merge all "Description" data into a single table. That, of course, would be ridiculous. The above seems similar.]

It's not just one column, but sets of columns that are very similar. But again, in accounting we may know that the duplication is not a problem because it hasn't been for decades doing it that way. But every domain is different. I'm currently working on a WCMS-like gizmo that I wish I used a GodTable instead of partition things into "kinds" of content or containers. I didn't want it to look odd to other maintainers, so I did it the old-fashioned way. It turned out to be too inflexible. How the info is displayed varies depending on where it's used. My hard-wiring of categories (tables) is already kicking my ass in terms of change difficulty. I have to write duplicate code to display A as an X and B as an X, whereas if I used a GodTable, everything would already be an X (or viewable as an X) without reinventing the damned wheel for each damned table.

[The apparent need for a GodTable often indicates a real need for type-subtype relationships, further normalisation instead of denormalisation, VIEWs to encapsulate UNION queries, or all three.]

Continued at WebGodObjectDiscussion.


One problem with UNION queries over similar tables is that many RDBMS do not optimize for similar WHERE filters.

  // Sample A
  SELECT a,b,c FROM
  (
  SELECT a,b,c FROM x
  UNION
  SELECT a,b,c FROM y
  )
  WHERE a = 7

// Sample B SELECT a,b,c FROM x WHERE a = 7 UNION SELECT a,b,c FROM y WHERE a = 7

Oracle, for one, has to create a intermediate table with the union of x and y BEFORE applying the WHERE clause, meaning it cannot use indexes.

For a work-around, I had to keep duplicating the WHERE clause for each table as shown in Sample B (x and y), which was a nasty violation of OnceAndOnlyOnce and a big source of typos. The WHERE clauses were far more involved than the samples. (Oracle may have since did something about that, I haven't checked.)

Even if it worked, one has to repeat the column list 3 fricken times. It's ugly.

That's a problem with a particular SQL implementation, and a one-time programming issue. "Resolving" it by creating an un-normalised GodTable creates a problem of update anomalies, which introduces more programming issues and ongoing performance issues.

It was not "one time", we had a large quantity of queries like that. And "particular implementations" can favor or disfavor just about any style or technique.

By "one time", I mean it's a programming-time issue which occurs "one time" relative to row updates. By way of analogy, if you own a car, the car purchase is a "one time" expense compared to the ongoing expense of fuel, oil, maintenance, etc. This is true, relatively speaking, even though you'll probably purchase a new car more than once.

I suspect they split it for machine efficiency reasons. But I'd like to focus primarily on human maintenance issues first. I don't think splitting reduced such overall. (It may be possible to rework the algorithms or setup such that larger tables would be split "under the hood" for performance, yet the split be invisible to query users.)

Sorry, I don't see how this follows from the above. Who split what?


And please don't insert new questions into old material without indenting. I likely won't spot such. Either indent or re-quote at the bottom of the topic. Thank You. -t


JuneThirteen


EditText of this page (last edited July 26, 2013) or FindPage with title or text search