Continued from AccessBasic, and (presumably) based on the sample database schema in MicrosoftAccess format at http://www.armchair.mb.ca/~dave/Examples/EDM.accdb
For one, it looks like it's using much of the "thin table" design philosophy. I've been round and round with another WikiZen on that issue and don't want to reinvent that debate again today.
Tables like "A/P Transaction Type Descriptions", "A/R Transaction Type Descriptions", "Scheduled Skills", and perhaps others could probably be factored into a single ConstantTable.
[At the expense of complicating queries, of course. Merging them would gain what, exactly?]
I spotted other duplication such "A/P Journal..." and "A/R Journal..." table sets that perhaps could be combined with sub-type indicator(s). It may make reporting more re-usable because the reports wouldn't need different queries other than the sub-type indicator predicate and title parameters.
[It doesn't make reporting more re-usable. A/P and A/R journals share structure, but they have no other commonality. A/R is not simply A/P in reverse, or vice versa.]
"G/L Journal Credits" and "G/L Journal Debits" are also probably factorable together on a sub-type indicator(s). In general, all of the "X/X Journal..." tables look like good factoring candidates, even if some columns go un-used for some sub-types. (Existence of unused columns are one of the big sticking points in the thick-vs-thin table debate.)
I'm sure there are other areas of possible consolidation.
[To gain what?]
Less tables
[In return, you gain added complexity everywhere else. That seems unreasonable; it's akin to mediocre C programmers who try pack as much functionality as possible into each line. Sure, their code has "less lines", but at what cost?]
Part of design is selecting the right ratios for units of stuff. Of course, there are disagreements over the ideal ratios.
[The "right ratio" in database design is that which minimises both complexity and update anomalies. Your suggestions do not reduce complexity; they increase it. You decrease only a table count. How is that reducing complexity?]
That depends largely on the future patterns of change.
[Assume that "future patterns of change" are not predictable.]
Then just about ANY design can be a net benefit under at least some of the eventual paths.
[That's a curious assertion. Can you provide evidence for it?]
No, and you can't provide evidence for the opposite. Thus, the future and what designs it favors is "unknown". In practice we use experience and intuition to make estimations about future changes because time machines have not been invented yet (except for the purpose of changing birth announcements in Hawaii).
- Your Hawaii reference is lost on me. Please stick to the topic and avoid "humour" unless it's actually funny.
- {It's a reference to Barak Obama. Certain idiots (read TeaParty?) think the birth announcement in Hawaiian newspapers are forged.}
[Or, we use normalisation to minimise duplication and update anomalies, and thereby achieve minimal complexity.]
- What "duplication"? You are duplicating table structures.
- Duplication of rows representing entities. Equivalent structure is not the same as duplication.
- Bullshit.
- Note, for example, that A/R and A/P often use equivalent structures, but they are by no means the same thing. Note that system log files frequently employ equivalent structure, but are by no means the same thing. Etc.
- You don't need 100% "identicalness" to have a reason to factor similarities.
- Actually, you do.
- Which IT deity made that rule?
- It is a consequence of normalisation.
- But how to score "duplication" is often disputed. Almost everyone agrees that value duplication is "bad" (with a few exceptions), but outside of that its hard to find a consensus. All else being equal, I think most would agree that duplicate table schemas should be avoided, but there are lots of factors competing to be optimized for. And I do indeed consider the similar "journal structure" in the tables to be "duplication" in this case such that it's a valid candidate for factoring analysis.
- How to score "duplication" is rarely disputed when normalisation is done properly. Occasionally, debate is then over whether data elements are different or the same. Merging distinct accounting journals because they have equivalent structure is almost never warranted. I say "almost" because all design decisions should be driven by the requirements, such that there are no absolutes, but in practice this comes close to being absolute. Pragmatically, distinct entity types that coincidentally possess the same structure almost invariably benefit from being kept separate, and rarely benefit from being merged. Merging them complicates queries, complicates changes that affect one type and not the other, and has performance and concurrency implications. All that to eliminate a table name? Especially when a table name clearly identifies the entity type that would otherwise be obscured?
- How do you objectively know it's "coincidence" that their structures are similar? And I agree it would probably complicate existing RDBMS, but that's different than being a universal rule of data that would be true in all possible database systems or industry domains. You haven't provided universal evidence that it complicates all possible domains/systems.
- And all row filterings are "obscured" using your viewpoint. You seem to be proposing hard-wiring aspect merely to avoid "obscurement"; that may not be a good enough reason. Obscure-ness depends on one's access/viewing technique. If you stop thinking in terms of tables, then you will view sub-sets of rows using different tools/techniques and then they won't be "obscure". Obscureness is in the human head. The universe does not "care" whether a bit of info is on the tip of your nose or 100 miles under Pluto's frozen surface. It's still a "bit".
- You objectively know it's "coincidence" that structures are similar (or the same) when you perform the process of normalisation and note the emergence of two or more tables with the same structure.
- Please elaborate. A formal process may perhaps tell you that two things are the same or resemble each other, but it won't necessarily tell you "why" from a domain perspective. That's what human analysts are for. Knowing that "flirb = flirb" doesn't tell us anything about why they happen to be equal and the domain meaning of it and the implications in terms of longer-term maintenance.
- Normalisation focuses on functional dependencies. Human analysts identify the functional dependencies between data elements, and normalisation derives tables from these. Thus, the focus is on whether or not two data elements should be the same data element, not whether two tables should be the same table. The former, it turns out, is far less of an issue than the latter. And that's only to the extent which the latter is actually an issue. Usually, it isn't. I don't think I've ever seen a data modeller try to merge A/P and A/R journals in a schema, despite their similarity. They may be structurally similar, but they are conceptually distinct. Normalisation would never result in them being the same table.
- But "conceptually distinct" is a judgement call, perhaps based on experience. In this PARTICULAR case it may make sense to separate them because it's worked out for 40 odd years. But again, I'm looking for more general rules, not what experience tells us about a specific domain. When somebody first automated accounting, they may have had no idea what kind of changes the future would hold and whether they are along "application lines" or "journal lines". One makes their best estimate in terms of the grain direction of future changes. If one finds out they have to keep making the same changes to all four journal-ish tables, then the splitting would have been the wrong decision in hindsight. I'm not necessarily trying to find the ideal philosophical model of the domain, but rather a SimulationOfTheFuture so that the tables fit future changes better. Philosophical modelling is just one tool to achieve that goal.
- "Conceptually distinct", applied to tables, doesn't have to be a judgement call if normalisation is done properly. It isn't even a consideration if normalisation is done properly. As for change patterns in four journals being a consideration, they are not. In pragmatic terms, the primary concern will be how they are used, not how they are structurally maintained. The focus will be on the fact that they contain distinct data, have different security considerations, quite possibly reside in different physical storage in order to support geographic locality (e.g., A/P is handled in London, G/L is in New York). Separating them makes sense from an operational point of view. From a maintenance point of view, changing all four to introduce a new column or whatever is trivial -- it's a few minutes worth of work for a skilled DBA, and happens rarely (at least, compared to the volume of row updates). However, if you think "philosophical modelling" is of value, please make a case for it. Until such a case is made, it's not worthy of attention.
- It's not just adding new columns, but the impact of doing such to applications and other tools that use the database. But again, in this PARTICULAR case you happen to be right based on the domain history and requirements. Hindsight is a lovely thing; too bad God doesn't rent it out. And the best schema design does require a judgement call about future changes. Domains can and do change in unexpected ways. As far as geographic separation, in theory a database system can make that a moot issue because the U/I may not have to care where the bits are stored.
- Adding new columns -- regardless of impact -- is typically orders of magnitude less frequent than row updates. Therefore, row updates are almost invariably the driver for technical decisions outside of normalisation, regardless of the impact of changes, which ultimately (in most cases) affect only the developers. UX issues are irrelevant here -- this is about the database(s).
- What if some group of users wanted to run lots of reports and studies on product categories, such as "paper clips", regardless of whether it's in A/R, A/P, G/L, or billing? They want to follow products or order categories from cradle to grave. If you had a lot of "cross app" activity like that, then partitioning starts to look worse. We know from accounting experience that such is not likely to happen, but that's only because we know the domain and its history. Starting out brand new, nobody could dismiss the above scenario as "unlikely" compared to any other change scenario. ONLY EXPERIENCE tells us that is unlikely, not "math on paper" guru normalization exercise. That's bunk. -t
- If those are the requirements, then they must be accommodated. Of course, this presumes a product category like "paper clips" makes sense across A/R, A/P, G/L and billing -- which it doesn't -- but I can appreciate the concept behind your point. There are sometimes exceptions that drive deviation from established practices, and you'll note my comments generally include adjectives like "usually" or "almost invariably" to account for these. Obviously, no one is going to discard requirements in favour of some abstract rule. However, if there are requirements like the above that impact the schema, they will be reflected in the functional dependencies and thereby translate into appropriate tables. Otherwise, if they're merely a reporting requirement, the usual joins and unions in the report queries will suffice.
- Are you suggesting we should pretty much ONLY design around the here-and-now requirements? In new or poorly-documented/understood domains, I find it best to build with flexibility in mind. Yes, we can work around our design/prediction mistakes with things like lots of UNION's, but hopefully we can avoid such. UNION queries are some of the slowest and messy/verbose to write queries for.
- That's not what I intended to suggest, though YagNi does suggest that it's better to focus on here-and-now requirements than an unknown future. Of course, that doesn't mean it's good to intentionally overlook here-and-now knowledge of future requirements. There's a big difference between designing a system to allow for the fact that we know we'll have to integrate with G/L next year -- which is arguably a form of current requirement, because we know about it currently -- vs designing a system for a slim chance that we might have to integrate with G/L some day. That said, "build with flexibility in mind" means normalising as far as possible, and categorically avoiding GodTables. The more normalised a database is, the greater the likelihood that any given change will be localised, that there won't be update anomalies, that the majority of the schema will remain stable, that changes won't affect unrelated parts of the schema or the applications that depend on it, etc. I'm not sure why you feel "UNION queries are some of the slowest and messy/verbose to write queries for" -- or why it matters even if they are -- other than some "query builder" tools don't support assisted construction of UNION queries. MicrosoftAccess, for example, forces the developer to manually write UNION queries in SQL, which is odd considering how common they are. On the other hand, needing a plethora of UNION queries sometimes indicates failure to implement appropriate type-subtype relationships in the schema, or failure to fully normalise the schema.
- I disagree with your assessment of ChangePatterns based on my decades of dealing with change requests. Yes, one can make views that "hide" the unions, but that can be said for the other way also: one can "partition" a GodTable via views also.
- I didn't really assess change patterns, except in the broadest -- and what I thought would be the least controversial -- sense. What is it, specifically, that you disagree with?
- Yes, you can "partition" a GodTable with VIEWs, but that doesn't resolve the problem of update anomalies.
- What's a specific update anomaly scenario you have in mind? Design is about weighing trade-offs and risks for each candidate design.
- I have in mind the update anomalies that result from lack of normalisation. See http://www.sqa.org.uk/e-learning/MDBS01CD/page_20.htm
- I mean in terms of your Access accounting example. I'm not looking for a generic "foo bar" example, seen dozens already.
- There are no (significant) update anomalies, because it's in at least 3NF.
- Please clarify "but that doesn't resolve the problem of update anomalies" then. Illustrate the "problem", please.
- Above, in response to my claim that it's possible to "make views that 'hide' the unions" (your words), you claimed that you can "'partition' a GodTable via views also." No, you can't -- not equivalently. A fully-normalised schema with VIEWs to hide UNIONs will not exhibit update anomalies. A poorly-normalised schema with VIEWs to make it appear normalised -- which is what I presume you mean by "partition" -- will exhibit update anomalies. Therefore, partitioning a poorly-normalised GodTable is inferior to adding VIEWs to a fully-normalised schema.
- Show me a specific and relevant case of an update anomaly for a "journal" style table.
- Imagine the journal and accounts were in one table with a candidate key of {AccountID, JournalEntryID}. You couldn't create an account without a journal entry to go with it. That's an insert anomaly.
- You mean account number definitions/catalog? I never proposed anything remotely like that. Even an extreme case of everything being in one big table, such as found MultiParadigmDatabase, those would typically be separate rows (with some kind of "type" or "category" indicator attribute(s)).
- If you're proposing GodTables, by most interpretations of "GodTable" you are proposing poorly-normalised (i.e., < 3NF) schemas. Is that, or is it not, what results from your approach? If your schema is in 3NF, I don't care what inconveniences you impose on yourself by an awkward design; it's probably harmless as long as it's in at least 3NF.
- Your example is more of a "God Row".
- What's a "God Row"?
- It's not a matter of "harmless", it's a matter of having the best design, which may involve tradeoff of risks. If there is "obvious data duplication", such as repeating the account number title all over the place, then it probably should somehow be factored out regardless of what we call the contraption. I don't the existing/common definitions for normalization because they use too many wiggle words, so instead will consider "duplication problems" on a case-by-case basis.
- Normalisation is notable for being -- if desired -- entirely automatic, assuming a sufficient quantity of data to accurately determine functional dependencies. It works entirely without "wiggle words" -- or any words at all -- so I'm not sure what basis you have for your claim that "common definitions ... use too many wiggle words". Can you give an example?
- Fairly recently there was a discussion of the "pizza toppings" example at http://en.wikipedia.org/wiki/4NF . It's also discussed in NormalizationRepetitionAndFlexibility, but that's a much older discussion I believe. I cannot find the newer one in Google. IIRC, I asked something like, "Even if all toppings are not currently dependent on delivery areas, what if some toppings get too soggy or stale for longer drives such that management is considering not offering some toppings for some delivery areas? During design time, what would be the threshold percentage of you building such into the database design? For example, what if management said there as a 50% chance they'd vary the toppings in the next 5 years?" I don't remember getting a satisfactory response. -t
- The only reasonable database design question here is "are toppings dependent on delivery area?" Betting on future coin-toss percentages is a fool's game; it's what the notion of YagNi is intended to prevent.
- 100% adherence to YagNi is a mistake in my opinion. It's one guiding principle (factor) among many, not the end-all-be-all. If the customer said there was 90% chance of needed independence soon, would you still ignore that for the current design? How about 99%? To ignore it would be foolish if those probabilities were reasonably reliable. It's almost comparable to washing your car knowing there is a 90% chance of heavy rain overnight. "But it's not raining now", the foolish full-YagNi-er would say, and waste time and resources.
- The only foolish act here would be failure to pin down the requirements with the customer during the requirements-determination or design phase.
- That's often an unrealistic prerequisite. Often the customer is not available for detail probing, and/or they have personality issues/quirks that make them a difficult interview subject. It's often the case that those who are best at the domain suck at everything ELSE. And then there is messy office politics such that the "customer" is multi-layered or fractured. Early in my career I worked for an engineering firm where most were used to planning and thoughtful analysis. That spoiled me because when I left, I really missed working with engineers in terms of analyzing requirements (although they often smelled bad). HumansSuck.
- Sometimes you can present the options to the customer, something like, "If we design it assuming they are independent of delivery area, it will cost you 15 grand and have 12 data entry screens. If we design it for dependence, then it will cost you 20 grand and have 15 data entry screens (meaning harder-to-use). If we design it for independence but dependency is later added after-the-fact, then it will cost about 8 grand to add, etc." Then you let the customer choose. But often there are many smaller decisions that the analyst will be required to make based on his/her best judgement, or at least be able to give as a recommendation.
- That's reasonable. It's a good way to pin down the requirements.
- Again, you claimed that "common definitions [of normalisation] ... use too many wiggle words." I'm not sure what basis you have for that claim. Can you give an example?
By the way, if the designers used my "fooRef" convention for foreign keys instead of "fooID", the schemas would be far easier to read. It's a good convention.
[I'll be sure to hop in my time machine, return to 1991 when this project was started, and get the team to consult with you about naming conventions.]
It's also possibly multiple applications mixed into one: 1) staff scheduling, 2) billing, and 3) accounting. And there is oddballs such as "Sched Client Medication" which suggest a patient treatment tracker app. Nothing wrong with integration, or at least semi-integration, but I'd still probably classify it as multiple integrated apps rather than one big app.
[It incorporates multiple business functions, but it underpinned a single integrated small-business management application for home health care delivery. It was an explicit requirement that there be strong, seamless integration across business functions whilst permitting each function to be expanded or customised.]
I will agree that if you want each app separable from the rest, then some of the structural duplication mentioned above may indeed be warranted. But this depends on the scope of what we are calling an "application". If you consider it one big app, then factoring between apps will reduce the total table count. It's a matter of whether we factor for a "sub-app" scope or an integrated suite scope.
[What do we gain by reducing "the total table count", especially at the expense of more complicated queries and constraints?]
A lot of the many-to-many tables could also be replaced by a single table like this:
crossReferences (table)
----------
crossType
key1
key2
value
timeStamp
note
I realize such is unconventional and that most current RDBMS don't support them very conveniently for referential integrity, but it is a way to reduce table-space clutter. It's vendors that don't favor such, not the inherent logic of the universe. But, that's mostly a side-note. -t
You've just made me very happy that I don't have to use any schemas designed by you. The fact you think it's a good idea to force extra complexity on the users of the schema in exchange for more data integrity issues is simply mind-boggling.
How are you objectively measuring this "extra"? You should have anticipated this question by now and stay silent until you have an answer because nobody should just take your word for it. It would be an interesting issue/observation if the laws of the universe forced one to massively violate OnceAndOnlyOnce to satisfy referential integrity, but so far you haven't demonstrated this.
The most obvious extra complexity is in the where clause. With your schema I now have to add an additional filter in order to remove the rows belonging to the other tables you crammed into it. Natural joins turn into self joins and I have to add table aliases in addition to the additional filters in the where clause. All this just for a lower table count, something of rather dubious benefit.
I've argued elsewhere that the way existing RDBMS implement natural joins is flawed. I'll agree that we sometimes have to do shitty things to work around shitty standards/conventions, but it's not inherent to the general relational model.
It's not just natural joins, every query needs the extra clause(s).
select * from role
becomes
select * from foo where subtype = 'role'
- Yes, but for many tools it's easier to parameterize on a WHERE condition than on the table name, improving code reuse and meta -ability.
- Any examples?
- I am unable to provide specific names at this time. I'll provide when possible.
You've already acknowledged the increased complexity in the integrity constraints.
Because existing RDBMS value performance over bloat reduction.
Nope. It's complexity that's added by your approach. You have to explicitly tell the database which portion of the table to apply the constraints to. It has no way of knowing otherwise. With the current approach, the constraints apply to the entire table, and the database knows that already.
In addition, you've upped my storage costs. Every index on the tables must now be prefixed by what is essentially the table id. I also have to index rows that don't need to be indexed, because I can't index on just part of the table. Every row in the table has an extra id.
Should we organize our tools around WetWare efficiency or machine efficiency? All things being equal, OnceAndOnlyOnce should hold. If something gets in the way of that, then we should look at it closer to see if repetition is a necessary evil or due to poor root tools. Under the hood the different "sub-types" could be implemented as a separate physical table. Schemas are only a logical model, not necessarily a physical one.
Perhaps something akin to TableInheritance in order. If there are commonalities among tables, it would be nice that our model both designates the commonality and takes advantage of it. A big pile of tables is just as annoying as a big pile of files without some kind of sub-division or meta info to group/classify them.
I will agree that existing products favor your approach. They assume certain design styles and are optimized for that; and if you go with the flow of a tool's assumptions about usage patterns, things usually go smoother. But it does create cluttery repetition.
- Also remember this topic was originally about ExBase, not the current crop of RDBMS such that I wasn't considering just a certain flavor of database.
You haven't reduced mental bloat either, since I have to add extra mental steps when I use or reason about your table.
Why is that? Trace the steps. Table name clutter slows me down. Maybe you have FastEyes, I don't, and many others don't. We want table-space lean.
- Let's say I want the information in the A/P Journal.... With the original schema, I have to remember the name of the table and write the "SELECT * FROM <table_name>". For your schema, I have to remember the name of combined table, which can't be meaningful since it's got a lot of unrelated stuff in it. Remember the name of the id assigned to pick out the information. Remember the name of the field that's being used to store that id. Write the longer query "SELECT * FROM <table_name> WHERE <table_id_name> = <table_id>". And possibly remember how to parse the data.
- The total quantity of "things" to remember remains about the same. Plus, one can use the name of the table to find the name of the entity code such that there is a drill-down nature. A giant list has no drill-down while my approach creates a 2-level tree. Also note that you can create views for common app query patterns.
- A 150% increase is about the same? (And I'm being generous because I'm not including the parsing or the extra verbosity of the query in that). Yes, you have a two-level tree, but to find something you have to flatten it into the giant list first. There's no connection between the top level and the level below it other than a grouping on superficial common structure.
- It's not 150% because some will be reused or common. The "A/R", "A/P", and "Billing" aspects may be common to multiple table "kinds". And why is "journalness" superficial? Maybe it doesn't matter as long as it provides MentalIndexability. The use of "superficial" is a bit puzzling here. I think the bottom line is that I prefer to think of things in terms of everything is a predicate: GET ME THE DATA WHERE aspect01="foo" AND aspect02="bar" ETC. To divide SOME aspects using tables and others using attributes feel arbitrary to me. A more generic way looks like the above pseudo-query. I know it goes against tradition, but tradition sometimes gets things wrong. MultiParadigmDatabase explores the tableness question some more. TablesAreArchaic :-) -t
- [Your approach trades in the "complexity" of properly-normalised tables -- using structures understood by developer and DBA alike -- and gets performance issues (especially in concurrent environments), more complex queries, more complex applications, and more complex constraints. This is good... How? In short, there is nothing that justifies your approach to "reuse". If you genuinely want to dispense with tables, look to Prolog and Datalog rather than perverting the elegant simplicity of the RelationalModel.]
- In a general sense, queries are not necessarily more complex. SQL demands a table name, but an aspect-neutralized-friendly syntax would dispense with a mandatory FROM clause. I see it as reducing complexity in some ways because it handles certain kinds of changes better. It's a more meta-tized approach. As far as performance, physical clustering on certain aspects can provide pretty much the same hardware arrangement under the hood. I agree that added dynamism may have some performance penalties in certain situations, but that has always been one of trade-offs of dynamism. As far as "perverting the elegant simplicity of the RelationalModel", it's actually more simple because we get rid of a static dimension by moving it to an existing dynamic dimension where it "inherits" all the benefits of that dynamic dimension. It's better conceptual reuse. I'll agree it's not a road-tested idea, but should be considered. -t
- [You may consider it all you like. I see no justification to consider it, and some good reasons -- which I gave above -- not to.]
- You have not proven that it must increase total complexity. I agree it changes the profile of what's complex and what's simpler, but without experience and testing it's premature to claim it increases net complexity. In some ways it's more like Lisp compared to traditional languages: it uses a simple idiom for everything (or more things) instead of specialized idioms for different "kinds" of aspects or sub-aspects. (Of course, it may also obtain some of the pitfalls of Lisp in the process.) -t
- [Your approach introduces complexity everywhere, solely to reduce the table count. Why? I'm afraid I don't get it, and your explanations appear to be HandWaving. E.g., what does Lisp have to do with it?]
- You only claim it increases total complexity, you have not proved it. You are HandWaving on the proof. As far as Lisp comparison, it's conceptually simpler because it gets rid of one idiom, "table" and uses an existing idiom (attribute-based filtering) to replace it. It's less total idioms. It's somewhat comparable to how Lisp uses EssExpressions for everything: data, block structures, functions, arrays, and so forth; whereas other languages hard-wire the distinctions into the root syntax and each has its own set of rules different from the other idioms. It's NOT "solely to reduce table count"; it's also to reduce idioms.
- [The "total complexity" increase is self-evident (more tables and simple queries vs fewer tables and more complex queries/apps/constraints is trivially obvious to anyone who writes this stuff, and I do), and your subsequent Lisp argument has grown even more baffling. I can only assume this peculiar argument is an attempt to sustain a (rather bizarre) point that the original database example has "too many" tables. "Reduce idioms"? Really? I mean, really???]
- I guess I'm stupid, because it's not obvious to me. Either that, you just can't think outside the box due to the table habit.
- You've already admitted to an increase in complexity in one aspect. You've yet to show any decrease in complexity anywhere. That's enough right there, but there's more. All queries have at least one additional clause in them. There are at least two additional names that have to be remembered. There are queries that would require aliases that didn't before, and no queries where they could be removed.
- All aspect divisions have to be remembered or looked up to use them, regardless of whether they are in attribute-space or table-name-space. And the table names as you presented are mixing two aspects, creating artificial combinatorial instance multiplication roughly comparable to AttributesInNameSmell.
- [Huh?]
- Most developer needs more or less fit the template: "How do I get all rows that satisfy X"? Table-ness is often only part of that question because rarely is one "consuming" an entire table as-is. Our apps or data studies usually want a subset or superset of any given table. Thus, often aspect "A" is just as likely to play a filtering role as aspect "B" and aspect "C". So for actual tasks, the aspect(s) that map to a given org's table-space are just one of multiple aspects that have to be looked up or included by the dev. One has to know or look-up the "names" of multiple aspects regardless of whether that aspect is encoded in table-name-space or data-attribute-space. Aspects are aspects. -t
- Your responses don't appear to have any connection to what your responding to. Nevertheless, while it's true we often want subsets of a given table (supersets are rather rare), those will still need to be specified with your proposal. You will also need an additional clause per what we would implement as a table to prevent what would have been in other tables from being included.
- Then ask better questions. In general, the pattern "SELECT WHERE aspect01="aaa" AND aspect02="bbb" AND aspect03="ccc"... is a more consistent and flexible pattern than using table-name for one or two (if name-jamming) aspects. IF you need a common short-cut, then create a view. It's best to start with the most general and flexible selection technique and THEN wrap common needs/patterns in views or view-like techniques rather than start with something rigid and inconsistent compared to other aspects.
- The thing is, you haven't removed the need to tell the DBMS what to query. You still have that special "aspect" and at least one additional one.
- If the query language is reworked to be attribute-friendly, the count is identical. (We could jam two+ aspects together under a single name like you did in an attribute or object label, but ideally they should be factored apart. It's usually a cleaner design.)
- Go ahead. Show me an example of this improved query language and explain how it works.
- See Example "Norf" below.
- Note that it blurs the distinction between "referential integrity" and "validation". But the upside is that any feature or function built for one can be used in the other realm. For example, one could have the ability to switch off all referential integrity dynamically using a single flag (cell value) if there is an emergency flood of activity that's clogging performance or preventing the testing of a partially-finished app or sub-system.
- Up till now, the only mentions of "referential integrity" have been you stating that your proposal makes it more complex, our agreeing with you, and your bizarre comment about it and OnceAndOnlyOnce. This is the first mention of "validation". It's not clear why blurring the distinction between "referential integrity" and "validation" is of any import to anything else on this page. It's also not clear what your "it" refers to.
- And go ahead, show us how this enables using a single cell that turns off all referential integrity that couldn't have been done without it.
- "It" refers to the consolidation of idioms mentioned earlier.
- PageAnchor idiom01
- In GENERAL, consolidation of similar idioms reduces total code size and reduces the number of related idioms, such as the DatabaseVerbs operations/features on any given DB object because we don't have to carry parallel tools/APIs/syntax for the 2+ different "kinds" of things: table-space inherits all the libraries/operations available to attribute-space and attribute-space inherits all the libraries/operations available to table-space. Look at typical DDL syntax: you have one set of conventions for altering tables and a different set for altering attributes (data). In fact, with consolidation you may not need dedicated DDL since data changes and table changes are the same thing. We could get an ax and chop the Query/DDL books in half: Kaaathuuuunk!
- I see no reason why DBMS would become more complicated by such consolidation unless there is some other yet-to-be-indentified principle at play that ruins the usual idiom consolidation benefits.
- [DBMSs don't become complicated by consolidation, databases do. If your goal is to reduce the table count and make the database "simpler", why don't you consolidate everything into a single table? What you're writing here sounds like nonsense to me, and at best it looks like you're paddling furiously to defend your "more than 50 tables is evidence of poor factoring" (or whatever it was) claim. The database presented here follows conventional normalisation practice used in industry (3NF), so if you're not to be immediately consigned to the batty bin and ignored, it's up to you to provide solid, convincing proof that your denormalised approach is superior.]
- I have proposed consolidation into ONE "table": MultiParadigmDatabase. And I DON'T have solid, convincing proof. And I agree that it doesn't "follow...conventional practice". I never claimed it did. Full-out single-table-ness an experimental idea and an encouragement to challenge the status-quo by asking if it's logic or convention that shaped history. I would think that Lisp fans would be glad to consider the consolidation of DB idioms. Existing RDBMS are overly complex in my opinion; they smell of poor idiom factoring; and there seems to be a desire to have alternatives that are more dynamic and meta-tized, at least for niche uses. If the subject doesn't interest you, feel free to ignore it. --T.O.P.mind
- As far as "denormalized", it depends how one defines "table". -t
[
You want "table-space lean". Speak for yourself.]
Everybody's WetWare is different. And, we should strive to respect OnceAndOnlyOnce. It may be a tool smell when we have to repeatedly violate it. You shouldn't be happy with the current state of things and try to understand if the "logic of the universe" dictates repetitious schemas are a necessary evil, or whether it's something we've been doing wrong in or with our DB technology. I advise all to ponder this duplication and not settle out of habit. -t
[Beware of conflating similarity with duplication. Bookkeeping systems (which this schema supports) are full of similarity that is not duplication.]
There is no iron-clad boundary between the two.
[Speaking as a former bookkeeper and accounting department manager, custom business application developer, payroll specialist, and database developer, the schema presented here demonstrates appropriate handling of duplication vs similarity. Some of the table merges suggested here (e.g., merging A/P and A/R journals) would be inappropriate.]
You mean double-entry-bookeeping? You are doing it wrong. I modernized and solved it at AccountingModeling. Give me Nobel now! -ArrogantTop
[See the counter-argument to AccountingModeling at the bottom of DoubleEntryBookkeeping.]
I remember one org's schema that had bunches of tables. To protect the guilty, I'll call it an "insurance" app as an approximation. It tracked cars, boats, guns, and other expensive items. They made a table set for each physical object kind (cars, boats, guns, etc.) when they could have consolidated them into something like this:
objectModels (table)
-------------
modelID
objType // ex: vehical
majorModel // ex: Ford
subModel // ex: Escort
yearFrom
yearTo
notes
objects (table)
-----------
objectID
modelRef
features // ex: 4-door [could also have feature table]
color
condition
serialNo
licenseNo
estPrice
yearFrom
yearTo // sometimes a range given when exact unknown
notes
--top
Example "Norf":
1. Your approach:
SELECT * FROM XX_YY WHERE aspectC='foo' AND aspectD=7
2. My recommended approach with traditional RDBMS (regular SQL):
SELECT * FROM XX WHERE aspectB='YY' AND aspectC="foo" AND aspectD=7
3. Single dynamic table approach (pseudocode):
SELECT * WHERE aspectA='XX' AND aspectB='YY' AND aspectC='foo' AND aspectD=7
4. Single dynamic table approach with a view-like shortcut/macro (pseudocode):
$view_xx_yy = "(aspectA='XX' AND aspectB='YY')"
...
SELECT * WHERE $view_xx_yy AND aspectC='foo' AND aspectD=7
// see how it's more meta/dynamic than traditional RDBMS?
// Similarly, we could "compute" the column expression list if desired.
--top
No, since it isn't.
You are just defending your syntactically- and idiom-quantity-bloated TutorialDee implementation.
[I think you've confused me -- the guy who wrote the *-bloated TutorialDee implementation -- with someone else.]
All you non-handled people look alike.
[Focus on the content and not the creator, then it won't be a problem.]
Then I lose an opportunity to trash T.D. ;-)
I was just pointing out that there doesn't appear to be anything useful it does that couldn't be done just as easily with more traditional table structure. (The text macros are okay, but I'd rather have user defined filters, functions, and the standard views).
That's because you like ADA-like bloated syntax and multiple syntactic constructs that repeatedly define a crippled subset of DatabaseVerbs on each and every language object type. I'll take more cues from Lisp and TCL than ADA, Thank You.
For example, I don't have to have a dedicated DB object called a "view". And I can store "views" in tables (just like EVAL again, ha ha) and use the existing table editors/tools to manage the CRUD aspects of them and invent any meta data or classification system I want on them; not just Larry Ellison's vision of what view management should look like. -t
[From where can I download, and then run, this magical language of yours?]
- Your local proctologist.
- [EyesRoll.]
- If so, your proctologist is doing it wrong. Either that, your head is up your...
- [Given any thought to what you want to be when you grow up? Or are you going straight from juvenile to senile?]
- "Not like you" is my primary requirement.
- [Dude! That's harsh.]
In any case, you can do all that with the more traditional table structures as well. Traditional views have been stored in tables for ages. You can create tables to hold whatever meta data you like. See, still no advantage to your proposed system.
But RDBMS objects have their own special syntax for each different object kind, bloating up the DB language and preventing cross-use or mixed-use. For example, for the static kind of tables, I would rather store table layouts (schemas) entirely in tables and wouldn't have any special "CREATE TABLE..." commands: you just fill in the DataDictionary-like tables (and perhaps issue a Refresh command, depending on architecture). If you wanted to script table creation, then just script filling in the dictionary tables LIKE YOU WOULD ANY OTHER TABLE (AKA data "INSERT" commands). IdiomReuse?. There would only be update-queries, no need for a special and separate data-definition language.
I suppose you could argue that since the DB vendors already created all that bloat, it's already done and one should just use it. But for newbie's, its still usually easier to learn a few idioms and re-apply them to what would be different kinds of "database objects" in the traditional RDBMS.
- Oracle can create tables in response to inserts on tables. This isn't something that's enabled or prevented by the proposals under discussion.
- Then you have a MirrorModel.
- Why would it be a MirrorModel?
- Because you have the self-rolled tables that have info on objects (say views in this case), and then the internal tables/structures inside of Oracle's "system". It's two things that have to be kept in sync, a violation of OnceAndOnlyOnce.
- The data in the self-rolled objects would be additional data, not data that's already available. (What would be the point of that?). Therefore, there's nothing to keep in sync, and no violation of OnceAndOnlyOnce.
- Every column could be viewed as "additional info". I suppose in your thin-table world, that's the way things "should be".
Re: "You can create tables to hold whatever meta data you like"
But it requires a join to use, and one may forget where the meta-info table is. It's also a violation of OnceAndOnlyOnce since the vendor has a row for a say a view, and we now have a kind of MirrorModel row to put in your own stuff.
Why would it require a join? Certainly there will be joins in some cases, but that also happens in your proposal, so no loss there. In general, you'll actually have to repeat stuff more often the way you want to do it, so again no loss there. (E.g., you repeated the table name for each row when you combined it with other tables.)
It's a 1-to-1 table. Normally when you see a 1:1 table you factor them together, but most vendors don't let you do that for the reserved objects. But because it's not a hard-wired concept in my proposal, you can stick any attribute on any "table" you please.
No, I usually only combine 1-to-1 tables when I'm almost always joining both tables in my queries. Otherwise, I prefer to have them separated. You can also add views of them combined if you want. (Which allows you to see your meta-data as if it was part of the same table even for the special tables.)
If this is back to the ol' "thin table" debate, then I'm bailing. Thin tables just don't fit my WetWare and appear to produce what I'd call "more complexity" but have not found an objective and universally-accepted way to measure "complexity" of schemas/table-design. -t
Also note that your criteria as stated appears to be based on actual usage patterns. Is that your primary criteria for combine/separate decisions?
Yes, we've noted that your WetWare causes you to prefer adding complexity in many places to reduce complexity in one place. Hardly something to emulate.
You want a complex DBMS by hard-wiring things that could be meta-tized. But I have agreed in other non-table topics that fitting WetWare sometimes trumps simplicity. Simplicity is moot if the staff doesn't know how to use it effectively.
When meta-tizing introduces issues (and in this case you've already agreed that it does) and no advantages can be found, then yes, I'm against it.
I already told you one advantage: simpler DBMS architecture.
That's the first time you've mentioned that. How would it be simpler?
See PageAnchor idiom01. Also search for "LIKE YOU WOULD ANY OTHER TABLE"
I see nothing there that shows the DBMS architecture being made simpler.
Well, I don't know at this point how better to describe it. It's pretty obvious to me, but I guess I am failing to communicate that. A rough analogy is the difference between ADA and Lisp: ADA hard-wires "block types" (conditionals, loops, etc.) with special hard-wired syntax for each of them, where-as in Lisp they are not hard-wired into the language, but merely convenient library functions that can be ignored such that one can just as well roll-their-own block structure/operation. For this reason, Lisp is a "simpler language" or "architecture" than ADA. The definition of Lisp would take up one or few pages (depending on the dialect), but ADA's is approximately 100 pages.
- [Shouldn't it be LISP and Ada, not Lisp and ADA? Aside from that, your point eludes me.]
- I give up then. Anybody else want to try?
- Everybody else thinks you're just randomly saying stuff and hoping it sticks.
- [It certainly looks that way.]
- All 2 of you are "everybody"?
Similarly to the block analogy, the MPDB approach does not hard-wire in the concepts of "table" and "view".
[What's an MPDB?]
See MultiParadigmDatabase.
[Ah. Imaginary stuff.]
So? We have to explore imaginary stuff to check if "design rules" apply only to existing RDBMS or all possible RDBMS/DBMS. If you know a better way to check besides "trust my judgement", I'd be happy to consider it. DontComplainWithoutAlternatives. Every single fucking existing system started life as "imaginary stuff".
And it can't hurt to ask what DB's would look like without (explicit) tables, can it? If you can prove it's universally objectively worse, fine, we'll dump the idea.
Note that I wouldn't recommend a dynamic GodTable for accounting applications because one has to be fastidious when dealing with money. But other apps/niches may be more forgiving, and nimbleness of change is valued over preventing loss or damage to data. ItDepends on the domain/niche. --top
[Deprecating normalisation is never a good idea, except in the analytical settings that demand it purely for performance reasons. Of course, they are not interactively updated, so update anomalies are not a problem.]
So you say.
[Do you denormalise for reasons other than performance?]
Yes, WetWare. Note that I don't normally recommend duplicating actual data (values), thus the level of "normalization" may be an issue.
[Do you believe WetWare takes precedence over the risk and/or complexity of dealing with redundancy and update anomalies?]
Many "update anomalies" are a matter of tradeoffs between kinds of "anomalies" and future change patterns.
[No, they're not. They're a specific technical consequence of failure to normalise, independent of any "tradeoffs between kinds of 'anomalies' and future change patterns". See http://en.wikipedia.org/wiki/Database_normalization]
We may also disagree on what we are calling "redundancy". But in general, YES,
WetWare often takes precedence over
ArgumentByElegance if grokkability is a more pivot-able issue than "fewer parts" or even machine-enforced "safety". We've had similar debates in and around
GreatLispWar. I engineer for "users" of the code, not so much for some ideal universal purity. Neither God nor the universe reads and reacts to our code (in a detectable way), only humans do.
[This isn't about ArgumentByElegance, but about the pragmatic effects of failure to normalise. It has nothing to do with how "God [or] the universe reads and reacts to our code" and everything to do with the simplicity of update/select queries and the schema. In a properly normalised schema, each table/RelVar has a one-to-one correspondence with a real-world entity type (and vice versa). That makes the schema more intuitive and understandable than one where tables hold multiple entity types, because for each real-world entity, there will be one and only one table that holds its information. Thus, a user looking for an entity without already knowing its table need only review the table names, rather than having to examine tables and their structure and their contents. Proper normalisation also makes the schema easier to modify and/or extend, because there is strong SeparationOfConcerns between tables/RelVars.]
In this case there is no real world "thing" that clearly maps to our data world. One could argue that in the old days one buys a generic paper "journal", and whether it's A/P or A/R etc. is a sub-category, such as a tab marker. And they may physically group them by month, but that shouldn't imply our tables are split monthly just because that was the "real world".
In fact, I'd argue that most "intellectual property" (including monetary credits and debts) has no "real world" counterpart because intellectual property is "in the head". We have old tablets and paper representations to study, but these were often organized for "processing/handling convenience" using the technology of the time. We have more options in the computer than they had back then. One of the most flexible techniques we have now is SetTheory where each aspect can be modeled as a set, and sets can potentially overlap or not overlap based on dynamic (or easy-to-alter) expressions. To keep flexible, we shouldn't give one set special or limited treatment compared to another, at least not hard-wire that treatment into the system, which is exactly what the "sin" of table-ness (above) does. (Yes, Mr. Top is questioning the "T" in T.O.P., at least for experimental purposes. Or at least redefining/expanding the meaning of "table".) --top
Still attempting to find something that will stick? There's nothing dynamic about set theory. You cannot alter whether or not one set overlaps with another.
I'm talking about computerized implementations. If you treat most aspects as sets and all sets are "created equal", then it's generally easier to alter membership, or at least use uniform tools and knowledge to manipulate them.
Those types of sets aren't based on "easy to alter expressions" either. They use the same basic approach that a table would use with regard to membership.
There would be no diff between tables and views, and you'd rarely need UNION.
Please explain.
- Here is traditional and the No Table approach:
SELECT * FROM A UNION SELECT * FROM B UNION SELECT * FROM C
SELECT * WHERE myAspect IN ('A','B','C') // N.T.
- What does this have to do with sets of any type?
- "table" is not special or different from row filtering. I don't know how else to explain it at this point.
- I understand that. I just haven't seen any advantages to doing that. But, you were just suggesting that we do something based on sets. What does "SELECT * WHERE myAspect IN ('A', 'B', 'C')" have to do with sets?
[Top, are you proposing something like
SetlLanguage, or something like
ExtendedSetTheory?]
No. It's a map of maps, not lists, and not (explicit) expressions. Most of the SQL-like idioms we know and love (and hate) can still be used, except one usually uses row filters instead of "tables".
[What would we gain by that?]
Didn't you ask that already? We don't need tables, so we toss them use other existing idioms instead: idiom reduction. I'm not saying it would improve things for every domain, but could for some.
[Under your approach, how do you insert into tables?]
You insert a row into the "master" table (for lack of a better name) and don't need to specify a table. (A given shop can opt to have a "required" attribute such as "table" or "entity" to enforce table-like membership.)
[Could you illustrate with an example?]
And one of the benefits is that we don't have to have duplicate schemas for things like A/P versus A/R versus Billing journal entries, per above. The enforcement of validation can be "inter-table" in the traditional sense. Journal-ness and app-ness (A/P, A/R, Billing, etc.) are generally independent aspects and we can tie journal-ness validation to the journal-ness aspect without having to tie it to the app-ness aspect.
[Could you illustrate with an example?]
Okay, assume we programmed the dynamic Map-Squared-A-Tron to use the following data dictionary table (DD) for "insert/update" triggers, AKA "validation".
dataDict (table)
----------------
ID
fldName
descript
type // number, integer, string, date/time, boolean, etc.
minSize // use 1 or greater for "required"
maxSize // blank = open-ended
etc // etc.
groupsMatch // Example: "journal, ap, ar, gl, bill"
The "journals" for A/P, A/R, G/L, and Billing share various fields in common. Let's say a new data row is added to the main (only) table. If the "groupsMatch" attribute for a DD row has "journal, ap" (ap=A/P), and the new data row has at least "journal" and "ap" in its "groups" attribute, then it will be "triggered" for validation inspection by the data dictionary "table" for the corresponding columns. We don't have to break the sharing pattern into a hierarchy, since the sample given (MS-Access) does not have commonality that can be readily partitioned that way for "journals".
Notes: "table" is used loosely here in that there may be a convention to have a "table" attribute if one wants to manage a given set of rows the old-fashioned way. Also, if this is custom-programmed it probably wouldn't be very efficient such that some kind of native hard-wiring of such a DD convention may be necessary for performance reasons. But focus on the conceptual side of things for now. Similarly, the "string list" approach is poor normalization such that a many-to-many table may be warranted. But this is to illustrate dynamicness to facilitate factoring out the column commonalities for "journal-ish" tables mentioned above. And dynamicness may not be appropriate for accounting apps because an org typically values accuracy over flexibility when dealing with money.
Finding a good primary key set for a DD used this way is a bit tricky. This is because we've gone away from a hierarchy (table -> column) into set-land, where uniqueness can be more difficult for humans to tame or visualize. That's the price of power. In this particular case, perhaps we can set up a validation rule where "fldName" is unique within all "journals" DD rows, per "groupsMatch". -t
[Sorry, I don't understand what you're trying to demonstrate. Perhaps if you presented a side-by-side comparison of the "traditional" data modelling approach (as demonstrated in the original database linked above) vs your approach -- perhaps using both conventional SQL and your hypothetical table-less SQL -- then you could clearly highlight the benefits and features of your approach. Thus far, I find them unclear.]
It's not really "table-less", it's "dynamic" tables.
As far as the big picture, what the example facilitates is a mapping between fields (shown as A to E) and tables (or sub-tables) that start with "t" below such that column info is readily reusable rather than repeat things like the "journal" structure, which the MS-Access sample does. We are fixing a violation of OnceAndOnlyOnce by using indirection (references) to columns instead of the traditional copy-and-nest. Sub-tables t1 and t3 share the column specification for column "D" here.
Col:A|B|C|D|E
-------------
t1 |Y|N|N|Y|N
t2 |N|Y|Y|N|N
t3 |Y|Y|Y|Y|N
t4 |N|Y|Y|N|Y
Etc...
[What problem does this solve?]
Fixes OnceAndOnlyOnce.
[I don't see the duplication the above is intended to fix. It appears to be akin to noting that fields called "Description" show up frequently, (mis)regarding that as duplication, and trying to merge all "Description" data into a single table. That, of course, would be ridiculous. The above seems similar.]
It's not just one column, but sets of columns that are very similar. But again, in accounting we may know that the duplication is not a problem because it hasn't been for decades doing it that way. But every domain is different. I'm currently working on a WCMS-like gizmo that I wish I used a GodTable instead of partition things into "kinds" of content or containers. I didn't want it to look odd to other maintainers, so I did it the old-fashioned way. It turned out to be too inflexible. How the info is displayed varies depending on where it's used. My hard-wiring of categories (tables) is already kicking my ass in terms of change difficulty. I have to write duplicate code to display A as an X and B as an X, whereas if I used a GodTable, everything would already be an X (or viewable as an X) without reinventing the damned wheel for each damned table.
[The apparent need for a GodTable often indicates a real need for type-subtype relationships, further normalisation instead of denormalisation, VIEWs to encapsulate UNION queries, or all three.]
Continued at WebGodObjectDiscussion.
One problem with UNION queries over similar tables is that many RDBMS do not optimize for similar WHERE filters.
// Sample A
SELECT a,b,c FROM
(
SELECT a,b,c FROM x
UNION
SELECT a,b,c FROM y
)
WHERE a = 7
// Sample B
SELECT a,b,c FROM x WHERE a = 7
UNION
SELECT a,b,c FROM y WHERE a = 7
Oracle, for one, has to create a intermediate table with the union of x and y BEFORE applying the WHERE clause, meaning it cannot use indexes.
For a work-around, I had to keep duplicating the WHERE clause for each table as shown in Sample B (x and y), which was a nasty violation of OnceAndOnlyOnce and a big source of typos. The WHERE clauses were far more involved than the samples. (Oracle may have since did something about that, I haven't checked.)
Even if it worked, one has to repeat the column list 3 fricken times. It's ugly.
That's a problem with a particular SQL implementation, and a one-time programming issue. "Resolving" it by creating an un-normalised GodTable creates a problem of update anomalies, which introduces more programming issues and ongoing performance issues.
It was not "one time", we had a large quantity of queries like that. And "particular implementations" can favor or disfavor just about any style or technique.
By "one time", I mean it's a programming-time issue which occurs "one time" relative to row updates. By way of analogy, if you own a car, the car purchase is a "one time" expense compared to the ongoing expense of fuel, oil, maintenance, etc. This is true, relatively speaking, even though you'll probably purchase a new car more than once.
I suspect they split it for machine efficiency reasons. But I'd like to focus primarily on human maintenance issues first. I don't think splitting reduced such overall. (It may be possible to rework the algorithms or setup such that larger tables would be split "under the hood" for performance, yet the split be invisible to query users.)
Sorry, I don't see how this follows from the above. Who split what?
And please don't insert new questions into old material without indenting. I likely won't spot such. Either indent or re-quote at the bottom of the topic. Thank You. -t
JuneThirteen