Improve Databases Or Else

A discussion that grew out of FearOfAddingTables about alleged proper ways of designing databases and application type systems that are not being followed.

Page name changed from goodDatabasesAreForGeniuses to ImproveDatabasesOrElse. Still arrogant.. but took away "genius".

I command everyone to improve the software industry through force, by using threats in the page title. I think imperatively.

The problem today (year 2008) is that a genius (or really really competent person) can barely create a strong, reliable, and well thought out database with the tools available. Someone less of a genius has it worse - people throw everything into one table, use nulls, lack foreign keys, use plain text files, XML databases (pfff), multi valued databases (pfff), and other crap.

We need to have a migration path available.. so that the unreliable databases like config files and unnormalized tables (myIsam) can become better relvars (tables). A semi genius would have built a good reliable database on day one, if he had the proper tools available and if the current tools didn't permit him to do such silly things with his database. We need to take it further and make it accessible to even the laymen to build databases that are really relational and properly designed. Documentation and websites for the products needs to educate users instead of providing just hype and incorrect information.

The good news is that most data stored in poor form can actually be normalized.. and there really are partial migration paths popping up (such as products inproving their DB's each year). But that isn't enough - some of it is getting worse, or it is improving at a slow pacee. It isn't the end of the world, but it could be a lot better. The relational model.. most people JustDontGetIt or even know of it.

Database designers, engineers, and laymen need to be encouraged with proper products, docs, and websites (and education, it is all related).

Today, people maintain unnormalized designs, abuse nulls, are afraid to create foreign keys. They are lazy or they simply don't have time at that moment in attention. Or they see it as too much overhead and too much work, because our current databases are heavy handed bastardized SQL. Our tools encourage us not to design databases properly - and what we are encouraged to do, is what we do (even smarter competent people are still encouraged by their tools).

Main problems: current databases permit us to be lazy, permit bad design, allow unconstrained data, have poor type systems, and permit sloppy integrity (if any).

Maybe what we need is something like TrueRelationalFileMaker? or TrueRelationalBento? with TrueRelationalToPseudoRelationalMapping capabilities

That can help and also languages/systems inventions that make the database more accessible and easier to just use and design things correctly.. the database should be thought of as a data structure or variable right in our programs (relvar vs table). RelProject and TutorialDee seems like it is headed this direction.

Yes but TheThirdManifestoNeedsToRideaHypeWave? somebody has to to create an TrueRelationalToPseudoRelationalMapping that surfs right in the middle of the somene elses hype (or create its own) now, TheThirdManifesto was written a lot of years ago, and so far it has not created its own HypeWave?, and since we need one of those to force the market in to including this kind of technology in mainstream databases and frameworks... I think the best practical solution would be to add a TrueRelationalToPseudoRelationalMapping to RubyOnRails, or to AptanaJaxer? (with or without GoogleGears on the client side) (in fact adding it to AptanaJaxer? might be the best way since I rememeber reading somewhere in the Wiki that prototype based languages suffer from a much milder form of the ObjectRelationalMappingImpedanceMismatch?)... What do you guys think? I think it may be the only practical way to force the market in to ImproveDatabasesOrElse --LuxSpes

[What do I think? I think the notion that "the market" (whatever that is) can be "forced" is suspect, and I think the influence of TheThirdManifesto, and the more mainstream (and arguably definitive) database text that embodies its principles, AnIntroductionToDatabaseSystems, has been more significant than you give it credit. From the early to mid 90s, talk around the water cooler generally held that relational databases were awkward dinosaurs, inevitably (and soon) to be replaced by object oriented databases or something else.] -- DV
- Alright, I have to admit that AnIntroductionToDatabaseSystems has been very influential, that the market may not be "forced" (but it can be influenced) but you have to admit that ObjectWeenies? are not doing so bad, XML is more popular than databases right now, Ruby on Rails Active Record is far more popular that TutorialDee, LINQ is bringing ObjectRelationalMapping to the core of the .NET Framework, the JavaPersistenceApi is one of the hot new features of the latest JavaEnterpriseEdition...
- [There seems to be some apples-to-oranges comparison going on here... ObjectWeenies are indeed doing well, which arguably is as it should be in general software development, at least right now. I've been developing DBMSes in object oriented languages since the early 90s, after developing my first DBMS in the late 80s using C. I'm still developing DBMSes in OO languages and wouldn't have it any other way, though these days I use the OO language to create a distinctly functional-flavoured set of internals. Object oriented databases, however, are still -- as they have always been -- of marginal popularity for general-purpose database systems, but appropriate (and doing well) in certain niches. RubyOnRails ActiveRecord is a purely client-side DBMS-interface technology; yet another implementation of the RecordSet idea. It can hardly be compared to TutorialDee, which is as much (if not more) about seamlessly (emphasis on "seamlessly" -- ActiveRecord is hardly seamless) integrating relational functionality into general-purpose programming as it is about DBMSes. LINQ is also pure client-side technology; a re-invention of EmbeddedSql for DotNet. I'm not sufficiently familiar with JPA to comment intelligently; though J2EE has always been in its own strange world of reinventing the DBMS in the application server, in what appears to be an attempt to deprecate those nasty, ugly, confusing SQL DBMSes in favour of nice, clean objects. (And, even more cynically, what appears to be Sun's attempt to sell application server hardware.) As such, like ObjectRelationalMapping in general, it can be considered an attempt to deprecate the RelationalModel in favour of pure object models, rather than integrating, leveraging, and improving them equally, as TutorialDee (and DateAndDarwen's "D" specification) attempt to do.] -- DV
[With the advent of the Web and the need to create dynamic, data-driven Web sites, academic references like TheThirdManifesto and AnIntroductionToDatabaseSystems certainly provided an important degree of justification and rationale for the development and subsequent popularity of pseudo-relational products like MySql and PostgreSql, and for the continued strength of OracleDatabase, IBM's DB2 and Microsoft's SQL Server. Without the support of these references (and the fact that almost every database student is exposed to AnIntroductionToDatabaseSystems), we could now all be enduring GemStone and similar products.] -- DV
- But was it really because of that? or it was also a case of "what is working right now are PseudoRelationalDatabases? so, why deal with the risks involved in doing something new (either ORDBMS or TRDBMS), lets just do something more or less like what is already working something like FearOfInnovating?.
- [Agreed. That may be a significant factor.] -- DV
[Furthermore, if the best "solution" was a heterogeneous database system, i.e., a TrueRelationalToPseudoRelationalMapping (and don't you mean a PseudoRelationalToTrueRelationalMapping?), we would surely already see numerous, popular heterogeneous database systems, whether true-relational or not (more likely, not).]
- Well we are certainly seeing lots of popular heterogeneous database systems, they are just not based on TrueRelationalToPseudoRelationalMapping they are based ObjectRelationalMapping (what is more popular right now, GemStone? or the JavaPersistenceApi (JPA) (with HiberNate, TopLink, CocoBase, etc) and what about Microsoft an LINQ ? Just take a look at the list in ObjectRelationalToolComparison... Why there is no list like that for TrueRelationalToPseudoRelationalMapping (or PseudoRelationalToTrueRelationalMapping?)
- [Simple: There is little motivation to hide pseudo-relational DBMSes behind a true-relational wrapper. The distinction is incremental, arguably subtle, and unlikely to be appreciated by anyone without a strong grounding in the RelationalModel. If you don't believe me, simply point out SqlFlaws to a randomly-selected group of application developers. The majority couldn't care less. There are, however, strong motivations to hide the RelationalModel behind objects. Some of them:]
  - [Certain popular programming languages -- C, C++, C# and Java, for example -- permit only static structures to be accessed via convenient (e.g., dot-dereference) syntax. This encourages the creation of tools that map dynamic structures -- like RecordSets -- to static structures, like classes, in order to get around a language-enforced impedance mismatch.]
  - [Education about database systems in general, let alone the RelationalModel, is marginal in some (maybe many) institutions. This creates graduate developers who are strong in object oriented languages but weak in databases.]
  - [There are still a significant contingent of developers who regard object oriented approaches as fundamentally superior to the RelationalModel, or who regard the two models as equal in value but fundamentally incompatible, or who staunchly insist that DBMSes are (and should be) persistence engines and nothing more. Until fairly recently, there were participants here who were occasionally quite vociferous in stating such views.] -- DV
[The reality is that these are niche products, fulfilling a requirement to connect to multiple DBMSes when it's required, but it's rarely required. Most such DBMS aggregations are only needed at a reporting level, which is adequately handled with ODBC/JDBC.]
- I would take a JPA implementation any day over ODBC/JDBC for OLTP (and I would use ODBC/JDBC only for OLAP reporting) and I am not alone in that feeling ask any ObjectRelationalMapper user. And For reporting I would love to have a PseudoRelationalToTrueRelationalMapping?, the SqlFlaws make it really painful to do reporting right now.
- [I don't disagree, which is why I am working on heterogeneous SQL DBMS access in the RelProject. However, I doubt this feature will be a strong selling point. A convenience, yes, but a breakthrough, no.] -- DV
It should be noted that AlphoraDataphor was a commercial failure, and it was almost a PseudoRelationalToTrueRelationalMapping tool. (Though in the end it was really a PseudoRelationalToPseudoRelationalMapping tool, and hence somewhat pointless.) As for tacking PseudoRelationalToTrueRelationalMapping onto some application development environment, that could easily turn into a dead-end. DBMSes tend to long outlive the flavour-of-the-month application development tools that access them.
- Exactly, and that is why it is so incredibly hard to replace them, that is why they evolve much slower than everything else, if we ask a company to just "drop Oracle" and switch to our new "greatest and latest" TRDBMS they will just laugh at us... but if we convince a good number of developers, that the best way to develop database applications is using a language like TutorialDee using a PseudoRelationalToTrueRelationalMapping? and we reach the level of popularity that some JPA implementations have now (or the level of popularity Ruby has now) then telling them that we now offer not only a PseudoRelationalToTrueRelationalMapping? but a much more optimized TrueRelationalDatabase?, then I think many of them will start using TrueRelationalDatabase?, specially for new projects. Sometimes people do not switch to a new technology not because it is not good but because nobody offers them an easy MigrationPath?.
- [I think the value of easy migration paths is over-emphasised. Speaking for myself, I switch when an entire suite of technologies makes it worth jumping ship, and then I don't attempt to keep a foot on both decks. For example, I switched to Linux as my primary desktop environment when it clearly became a viable alternative to Windows for software development, not because Wine or VmWare made it possible to run Windows applications under Linux. (I don't have VmWare or Wine installed.) In the 80s, I switched from ExBase to a custom C-based DBMS when the C-based DBMS clearly wiped the floor with ExBase; it was actually cheaper to migrate whole ExBase apps to the new system than maintain any ties to legacy ExBase. When a technology represents an end-to-end improvement, developers will make a wholesale switch, and indeed the entire history of computing is proof of this. It's why we're largely developing on PCs instead of mainframes, mainly coding in Java/PHP/C#/C++/Perl/whatever instead of COBOL and Fortran, using SQL DBMSes instead of CODASYL (or ExBase), and almost entirely not using technologies to migrate from the old-tech to the new-tech. In most cases, no such technology existed. Certainly, there are areas where migration paths are valuable -- WebServices as a way of leveraging legacy mainframe systems in modern SOA environments, for example -- but these are niches rather than technological drivers.] -- DV

Personally, I think the best way to ImproveDatabasesOrElse is to create slicker, easier-to-use administration tools for true relational databases, and provide conventional application interfaces like ODBC and JDBC so they can be used by virtually any application development environment. At the same time, work should be done on creating integrated, easy-to-use application development environments that incorporate both client-side and DBMS functionality into a single, cohesive, unified whole.] -- DaveVoorhis

Use a product and create a company to educate people and offer proper documentation and education with the product. The problem is simply time, luck, and other such things. Looking at what Dataphor has done will be a good start.. and heavily researching RelProject and other things are on my to do list (along with reading as many PDF files from Darwen/Date et al).

{By migration, I don't mean keeping all backwards compatibility and offering the perfect migration path.. as that isn't possible - I mean we simply need to make it easy for someone to try the relational model (instead of just PDF files that explain why everyone is wrong, etc.) We need also to tell people that yes, they can export their existing data from their current SQL database, plunk it in a relational product and try it out - even if that means they have to change some of their current data (such as changing nulls to defaults). Data is migratable by its very nature. I do not mean keeping nulls, keeping SQL flaws, or even trying to follow Dataphor's path exactly.}

{One idea is for example to compete with SQLite.. offer a nice alternative to SQLite and say why don't you try our relational database that is embeddable and easy to use?. When people choose tools like SQLite they aren't looking for a huge setup like Oracle.. so easing them into the relational model first is what I meant by migration. i.e. not requiring them to just take all their huge Oracle databases that they've invested into and magically change over to a new paradigm. First, ease them into it and let them know about it by making a nice SQLite style product which many people can use without any fear. Tell them on the side that there is also a relational alternative to MySQL and Oracle planned, in addition to the SQLite style small starter database. The idea is not to just continually fight people with academic revolutoinary fire, as Fabian does.. but offer a real solution to get them going. It needs to be easy to setup, easy to use, or people won't convert/migrate/try it. For example, right now, all we have is a bunch of PDF files, and no actual product to prove the relational model can work. The easiest way to get people started is not to yell at them and tell them all their Oracle and MySQL databases need to be immediately converted over to the relational model. Rather we have to offer solutions to satisfy people's curiosity. Try this.. download this.. no .Net requirement, no existing database requirement, no hard stuff to setup, no hosting company required to try the DB, it just works. Might be hard to accomplish, but that is a good way to get people interested.''.}

At the risk of tooting my own horn, the RelProject is very close to this. It's trivially installed, requires no configuration, is easily embeddable in Java applications, and is entirely self-contained except for requiring a Java run-time environment. What it lacks are several things:

A beginner-friendly administration and interaction front-end... The current front-end client is command-line based. It should be replaced (or augmented) with a user-friendly, click-n-go UI that is perhaps closer to Microsoft Access in flavour.
It lacks easy integration of existing SQL DBMSes, though this is a work-in-progress.
JDBC/ODBC drivers, which would make it possible to use with existing client-side products, use it to build dynamic Web sites and desktop clients, etc.

I believe the primary target competitor should be Microsoft Access; the product should be a user-friendly desktop DBMS that works equally well as a DBMS server. This represents a significant opportunity, as there is really nothing like Access in either the OpenSource or commercial arena, and Access is often a beginner's introduction to database technology. Access also containts a bucket of flaws that could be addressed at the same time. -- DaveVoorhis

This is utter arrogant and insulting bullshit, not science. ArgumentFromIntimidation. I protest such accusations without real evidence and metrics. And, you are not only table happy, but topic happy. This belongs in FearOfAddingTables. I think somebody made this just to get my goat. --top

So says the person using bold-font and making blubbering protestations without real evidence or metrics? Woah, top, you're being slightly less hypocritical today than you usually are. I'll have you know that almost everyone on the wiki feels most of your contributions are also "arrogant and insulting bullshit, not science", which is why we so often feel that rudeness is, indeed, necessary - after all, you started it. So I'm going to look to your Golden example here; I'll pretend that you'll treat this page with the same respect you'd want your ideas (like TypesAreSideFlags) to be treated. I'll be watching.
I don't profess objective GoldenHammers. And good definitions do NOT take up 800 pages (TypeTheory). That shows either poor thinking, verbosity contests, or shitball concept factoring. Somebody should be ashamed to present that. --top
Golden hammers? This had nothing to do with 'hammers', Top. Your 'golden example' is all about treating others how you would like to be treated. And I must say, I'm delighted to see you think that calling you a poor thinker shitball is a perfectly acceptable response to what anyone might consider "utter arrogant and insulting bullshit" coming from you. For now and forever, I'll get to point to this fine example of how you'd like others to treat you. And 800 pages of TypeTheory would hardly be one massive definition - it is far more likely to be one page of definitions mixed with about 799 pages of explanation, references, and credits - pages that you'll forever refuse to read because your "utter arrogance" leads you to believe you'd actually comprehend a definition (and especially the significance of its subtleties) without context. History shows that you're incapable of this; you really aren't as intelligent as you think you are. As far as 'shame' goes, YOU should be ashamed of attempting to 'KISS' for the masses without actually having worked through your own shitball concept hypotheses to confirm applicability, for that is the act of a charlatan or religious cult leader... not a scientist.
"Confirm applicability"??? The term "hypocrite" comes to mind. Smart people can demonstrate their claims on practical grounds. So far, you are not one of them. MentalMasturbation is NOT a substitute for science.
"Smart people can demonstrate their claims on practical grounds. So far, you are not one of them." Back at ya. You've NEVER given me the impression of intelligence... only of experience in the field of Relational programming for business applications (and not much at all beyond that little area). All your claims outside that little field are the result of your "shitball concept factoring" and your attempts to "Keep it Simple for the Uneducated Masses" (usually meaning yourself, since nobody else is interested). "The term "hypocrite" comes to mind" when I think of you too. You've never shown your pet theories to have any value with science OR with math, and you rarely think things all the way through. Excitement over a pet-theory is NOT a substitute for proper reasoning, which necessarily excludes ANYTHING to do with 'EverythingIsRelative'. Obviously, you think you are above 'confirming' your theories, judging by the number of '???' you placed by the term.
Perhaps we are BOTH zealots withOUT evidence. The difference is that I believe nirvana is in the head instead of a real place. --top
Perhaps. Or perhaps you're too busy building 'nirvana' in your head to pay attention to evidence. Not that you'd care about evidence anyway, since ObjectivityIsAnIllusion is the excuse you built just in case someone actually provided objective evidence. It, and 'EverythingIsRelative' seem to be the cornerstones of your nirvana. Let me give you some help, Top - the place you're seeking would be reached far faster with some fire and some well chosen weeds.
[The products that people use! I think Top is basing his arguments on the tools he has chosen which do not support strict typing, integrity, etc. People tend to protect the tools they are using that they've invested a lot of time in. This is one case where academics are a good reference point since they don't have a bias with certain products (at least good academics don't teach specific products). I am not an academic myself, though, so don't consider this page academia against real world.]
Incorrect, as usual. I started out with type-centric languages. I used to be a PascalLanguage fan. But over time I found dynamic usually better once you change the way you think. My first dynamic exposure came when I had to use VB-script (not VBA). VB-script does not accept type declarations. I thought it odd at first, but it grew on me. (Well, I take that back, I used a minicomputer OS scripting language that had dynamic typing in the early 90's.) --top
{Surely you are not conflating data management with application programming? These are almost entirely different things.} -- DV
- I disagree. As a table-oriented programmer, I see a blurry distinction between them. --top
- {And through your blurry vision, you deem it appropriate to devolve the rigour and integrity of database systems to that considered (barely) acceptable in casual scripting? Well done. I hope you never come near an enterprise, or its databases, over which I have responsibility.} -- DV
- Is this supposed to be a real argument, or yet more self puffery?
- {Neither. It's disgust at your lack of professionalism. If you were a plumber, you'd be a cowboy -- the sort that seems cheap at first but costs ten times as much when the basement floods.} -- DV
  - (And then he'd say their complaints are 'anecdotes' and therefore do not constitute objective evidence against his techniques of choice, just as he finds excuse to dismiss all other evidence he doesn't wish to acknowledge (statistical? well, they couldn't have been measuring the right thing. anecdotal? it couldn't possibly be representative. mathematical proof? well, that's just MentalMasturbation - and, besides, it isn't simple, doesn't fit into two pages or less, and can't be explained to a layman... so it doesn't count. physical proof? well, ObjectivityIsAnIllusion, there is no physical proof). Of course he doesn't need evidence to make HIS sweeping claims about 'bloat' and 'red e-tape'. But such hypocrisy doesn't bother those who never critically examine their own behavior, so Top doesn't Mind.)
- Are wrong and arrogant. Professionals are not arrogant and rude, but analytical and evidence-oriented. Go to hell, I don't have to be insulted like this. I'm done with you! DynamismHasItsPlace If you don't want to work on those kinds of projects, fine, but stop extrapolating your niche to everything.
  - Hypocrite. TopMind is one of the most arrogant rudest, least analytical (especially self-analytical), least evidence-oriented persons on this Wiki. Professionals are willing to be students and don't have an 'agenda' when they ask questions of others where they don't know the answer themselves. Professionals don't act like charlatans and design theories 'for the masses' without first proving them. And TopMind doesn't have to be insulted; if he wants the insults to stop, all he needs to do is be considerably more professional.
  - "Without first proving them": The "H" word comes to mind again. You proved nothing. Your one scenario is not realistic, and you dismissed the downsides of red-tape languages without any rigor. --top
  - You've categorized and dismissed 'red-tape languages' with ZERO rigor, you unprofessional arse. But I was actually referring to any number of your crackpot theories, such as TypesAreSideFlags. Your theory that there are such things as 'red-tape languages' should be among them.
    - Are languages like ModernPascal red tape too? Anything that isn't weak/loose/dynamic is red tape (even though strong languages have dynamic features avail, even untyped pointers and a parameters)? Freepascal is put together by a bunch of young hackers, not bureaucrats - as will Qomp. But we should really be discussing databases here, and not languages. Is there something wrong with making databases easier to use that are more relational? I can't see any problems with doing that. Are you claiming it can't be done and that the world is just as good as it is now?
    - I'll rephrase it as "red-tape languages and/or techniques". We can look at language-specific issues in a more specific topic.
  - {What is unrealistic about my scenario? Can you recognise the fundamental problem it represents in terms of databases and the dynamically-typed tuple/row values implied by DynamicRelational? What is a "red-tape language" (I cannot find the term in any academic or engineering literature on language design), and what does it have to do with database schemata?} -- DV
- {Dynamism, as you describe it, has no place in data management. In application development, maybe. In creating persistent data stores for CAD applications and the like, perhaps. In database systems, no, not at all. If you think it does, then you do not appreciate either the value of databases or the role of database systems.} -- DV
- That's it, i am just using the wrong techniques because I am not wise like you. I see and weigh things all wrong because my brain is inferior to yours. Its that simple.
- {You wrote it, not me, but it's nice to see we finally agree.} -- DV
- HumansAreLousyAtSelfEvaluation
- Indeed they are, Top. You have a habit of NEVER self-evaluating, so you're worse than most. But, since you were trying to be sarcastic, you actually managed a dead-on self characterization.
- I have so much assistance from the likes of ObjectiveEvidenceAgainstTopDiscussion, that original criticism of myself has no room.
The products that you use now, Top, not the ones that you used to use back in the day. People get lazy and say.. to heck with these things I've been using, I'm switching to this product because it is quicker, sloppier, easier, etc. When in fact it can be easier and quicker to have the system verify things for you automatically.. i.e. there isn't a holy grail, both can be more convenient.. it is whether one values integrity or not.
You keep saying this over and over, but that does not make it true. -- top
- You keep saying that over and over. That doesn't mean the people you say it to haven't already provided sufficient examples and evidence. It's just your way of waving your hands and pretending that nothing the other guy said could possibly be valid. Maybe you do it because you can't understand what was being said, or maybe you're simply an arrogant prick who doesn't wish to acknowledge arguments. You certainly seem 'lazy' when you respond to everything with 'I don't care, I'm going to believe whatever I want to believe anyway; by my definition, nothing you can provide counts as 'evidence' in my eyes because ObjectivityIsAnIllusion.' We would rather let you wallow in ignorance, but you make all these wailing and bleating sounds intending to drag others down with you (you even target them 'at the ignorant masses'). It doesn't seem to matter what the subject is; your pattern is always exactly the same: deny the argument is valid, ignore any counterevidence, state that the other guy hasn't presented any evidence, then ask 'IsYourRudenessNecessary' when the other guy starts shouting at you for being such a dick. And no, the rudeness isn't necessary - it's just well deserved by the time you receive it.
You are being an arrogant prick: ArgumentByNameCalling? ("lazy"). Heavy typing bloats up code in most industry languages and creates e-red tape. It is not a free lunch. Show it being bad under practical scenarios, don't just claim. --top
- Bloat required to check all your incoming loose and weak types with your own hand written run time checker.. is what people like you just don't understand... that and unit tests (which people don't have time to write) to reinvent integrity checks. People should be writing unit tests for important stuff other than what can be automated through type checking already.
- See the "marshaling" comment nearby.
{I showed it being bad under a practical, though hypothetical, scenario below. I'm sure you can extrapolate that scenario to any number of other domains and situations. It clearly shows how strong static typing would have prevented a problem that weak dynamic typing allowed.} -- DV
You did NOT weigh them against the downsides, namely bloat and red-tape. Bloat itself causes errors. You can't just hand-wave this factor away. --top
- The downside, TopMind, is that in a weak/loose/non-integrity enforcing system, you end up writing your own type checker by hand yourself in each and every application (or you forget to, which is maybe 70 percent of cases.. and errors slip into your database). Writing your own functions that verify all is okay at run time is bloating up your code just like declaring things bloats up your code too. Type systems exist so that if you forget what your type is (especially with side flags and other hacks) then at least you've got some automated integrity happening (in the database, and/or in the application too). The more automated checks, the better. The more you manually have to check yourself, the less efficient you are in keeping integrity. It's like trying to claim that a timer on an oven, or a fire alarm is useless.. because real men just use their nose to smell whether a pile of shit is burning in the kitchen.
- That is not true. A lot of biz processing is "marshaling", passing stuff around. One only has to "open" a small percent of what is being passed around. Thus, validating everything is not necessary. Plus, things like a DataDictionary reduces the need to track types directly in the code. --top
{So you'd rather risk screwing-over your users at run-time than introduce the so-called "bloat" and "red-tape" of type references and definitions, which eliminate the problem I described? Your priorities are most peculiar. And you claim this introduces problems that are, presumably, worse than the scenario I described? Sorry, mate, I don't believe it. Your turn to ante-up with a scenario. Furthermore, any given type-bug in a dynamically-typed application program may be considered acceptable (for some measure of acceptable) because it occurs exactly once -- when it's fixed, it's gone. In a dynamically-typed database, the problem comes back every single time bogus data is entered into the database. Do you consider that acceptable for any measure of acceptable?} -- DV
I weigh the trade-offs. If the app environment and schedule and budget would fit better with the anal red-tape BondageAndDisciplineLanguage approach, I'll recommend it. If not, I'll use lean and clean type-free languages. Dynamic languages read more like pseudo-code while a BondageAndDisciplineLanguage reads more like a billy goat puked up a gnawed-up tax-law book. By the way, we've drifted away from the table-width issue, which is what the topic was originally about. That's where the evidence needs to be. --top
- The evidence is that ConstantTable page seems to be a page created by you, which is a thin table and which is a form of normalizing a database (do you know what that is, by the way). And this constant table is also similar to my enumeration idea. This is evidence, Top, that you are either a hypocrite or YouJustDontGetIt - or that when you created that ConstantTable page you missed the very irony that it goes against the wide table idea and reinvents normalization with a new term.. called ConstantTable. Your constant table is an enforcer, a mapping, a more integral design than if you'd thrown in a sloppy non normalized value into your column value. Just like Hugh Darwen created an alternative to NULL's by using thin tables, you created the ConstantTable. You are in ViolentAgreement without knowing it, and you are showing hypocrisy through your own pages you've created here as evidence.
- The vast majority of strongly typed languages I use don't qualify as BondageAndDisciplineLanguages, and I use very little ManifestTyping at all. That you conflate these things in your mind (StrongTyping, ManifestTyping, and BondageAndDisciplineLanguage), even after the wide array of related discussions you've dabbled in, really doesn't say anything positive about your mind. I find it utterly sickening that you continue to spew your unjustified 'facts'. So, I'll repeat something to you: You keep saying [that strong typing necessarily means BondageAndDisciplineLanguage] over and over, but that does not make it true.
{This is not about programming languages but databases, in particular your notion of DynamicRelational in precisely the context I've described above. Please answer my questions without evasion or diversion. As for your so-called "table-width issue", I responded to that below. You have not responded to my response.} -- DV
- His idea of ConstantTable is a thin table which makes him a hypocrite.. or he JustDontGetIt and is in ViolentAgreement without knowing it, but without realizing what he is doing in his current application is not what ConstantTable and thin tables advocate.. which is normalization and reduction of duplication, and integrity.
- Perhaps we need to explore a specific example so that we don't have to compare one sweeping generalization to another. If there is a good reason to make a skinny table, I'll do it. Avoiding empty/null cells is usually NOT one of them. --top
- {I've already given you all the specific examples you should need (one!) to soundly reject DynamicRelational and any of its ill-conceived and even dangerous tenets. That you can't see it is frightening. What would your bosses think if they knew your ideas represent a risk to the integrity of their data?} -- DV
- Most bosses know there are tradeoffs. As far as examples, are you talking about constant table examples, thin table examples, or the comma-payroll example? You need to name your "subroutines" here. --top
- {I am referring in particular to the comma-payroll example. Do your bosses know that use of DynamicRelational creates a deadly tradeoff between a one-off design-time effort to identify static types, vs. an ongoing possibility of invalid data?} -- DV
- Have you stopped beating your wife? Anyhow, the comma example is about strong typing, not skinny tables. --top

Arrogant and insulting is swearing in this wiki and using BadWord. This was not intended to attack you Top.

Geniuses don't abuse nulls, but lazy people do.

An example genius design is something like:

http://web.onetel.com/~hughdarwen/TheThirdManifesto/Missing-info-without-nulls.pdf

He gets rid of the nulls in a clever design.. but regular people just go ahead and abuse the nulls since they aren't genius enough to avoid them..

Instead of creating a table with some definitive descriptions in another relvar (table) like:

employed unemployed selfEmployed noEntryYet

We have people instead doing:

  TRUE  (employed)
  FALSE (not employed)
  NULL  (not sure, self employed possibly, or not applicable, or not entered yet)

This abuses the NULL because it is available to us.

A genius would not abuse a null because he is already a genius and knows not to abuse a null.

Yeah, let's abuse ID's instead. That'll fix it.
- Normalization is not abuse, it is for geniuses.. and it is tough.. it is hard. Hence the page title name. An alternative is an Enumeration that is not part of another table...

         TYPE workStatus = (employed, unemployed, selfemployed)

The problem with the above enumeration though is that it is not something we can query and change as part of the database.. it is more like a BOOLEAN in some respects in that it is more fixed (but not as fixed, it is more like a byte or integer). But some times we don't need to query it. Pure relational wheenie wouldn't allow a enumeration type (which just maps to a byte or integer at an implementation level) since it could lead to three value logic.. but it may also be something that stops people from using nulls when they don't have to, and avoids creating several different tables as in Hugh's page.

Since not all people are geniuses, the idea is that we shouldn't give them too many chances to be silly.. we should hit them with a whip and ban them from doing silly things. NULLS give people too many chances to be silly. Date, Darwen and all have several articles that debate the usefulness of NULLS. Codd at one point (the original relational model maintainer) almost appeared to be okay with 3 valued and 4 valued logic in his later writings.. but I personally think that is because he never understood what an enumeration was.. in my opinion enumerations are superior to some silly 3 or 4 valued logic.. and in my opinion even skinny tables are better than NULLS

A genius like FabianPascal says to someone else that YouJustDontGetIt so and so's design is wrong wrong wrong wrong.

How do we make them get it, easier, then? How do we make database design something that people can get?

Geniuses can also multi task and see the multiple relvars at once or multiple tables with foreign keys at once.. while others JustDontGetIt and just see values, rows, columns, in the form of an HTML table.

Yeah right, and I suppose geniuses can use Go To's also because their huge multitasking brain can handle all those wild jumps. Indented blocks is for single-tasking limited brains. Enjoy your Go To's, dear genius. Forget relational even, and go pure graphs. Didn't they used to say to GraceHopper: "High-level languages are for girls, real men use assembler"? --top

A smart programmer uses the high level language and visualizes the Go To statements in his mind or in his assembler window only if he needs to dissect the code into what the CPU is really doing. High level languages were invented to ban people from Go To, just as banning NULLS bans people from doing bad things.

Big Change Pivoting on The Trivial

The problem with the skinny-table approach given is that table design hinges on null existence. If we change a column to "required" per business rules such that it can no longer be empty, the above suggestion would dictate changing the schema via adding or merging tables. This makes the schema anti-change-friendly. The existence or lack of a required column is too trivial to hang a design on. It's like designing a house layout based on hair length. "Required" should be a simple switch that one can toggle on and off without having to refactor the entire schema for up or down. --top

I didn't see a satisfactory response to this. --top
Perhaps because your criticism didn't make sense to me. I don't know what you mean by "table design hinges on null existence". -- DV

Is there a SkinnyTablePattern? on this wiki? I think it is a pattern (serious, it is). I assume you mean normalization causes skinny tables.. which it does.. but at the same time if you don't see them as tables and consider them just lists of stuff, tuples, etc., it isn't as bad that it is skinny (i.e. people fear creating more tables because that's too much overhead with too many ID fields? An enumeration type solves that issue but the problem is then you cannot query the actual enumeration and find out what enumerations are available.. but do we need to?)

It is not normalization, it is *excessive* normalization. Skinny tables are frighteningly close to NavigationalDatabases: pointer pasta city. "Relational" means "based on tables". Skinny tables are less table and more like lists. Don't call it "relational", but Listational or something. Lispers may dig it more.

[You have no idea what you're talking about. Absolutely none. You've demonstrated often enough that you have no concept of the theory that underpins the field(s) in which you work, so your comments are based purely on ignorance and whim. I wish database practitioners had to belong to a professional body, so that the above and similar comments could be used as justification to disbar you from it.] -- DaveVoorhis
You are projecting due to your frustration over your inability to show scientific evidence of your claims. The needs for science in IT has not gone away. You just want to turn IT into a MentalMasturbation bullshitting contest, because that is ALL you are good at.
[My claims, views, and approaches concur with over thirty years of hard-won experience in the database and data management field, which have been empirically demonstrated by numerous reliable, durable, functioning systems. Where is the scientific evidence for your claims?] -- DV
That only demonstrates TuringEquivalency. You need to find some applicable metrics and show how Foo Oriented Bar ranks higher on those metrics. Anecdotes are like assholes: everyone has one.
What the f**k did DV say that had anything to do with TuringEquivalency or any particular programming orientation, Top? Damn me if you aren't one of the biggest bullshitters I know. Do you just fall back on the same tired lines whenever you get confused? It seems as though you don't comprehend anything more than that you are under attack.
Hold your horses, I merely conveyed my thoughts poorly. What I meant to say was that "reliable, durable, functioning systems" can be written in just about ANY tool, paradigm, or language if you throw enough time and effort at it and hire the best. Large reliable systems have been written in assembler and COBOL with flat files. But this by itself does not mean we should all use assembler. --top

[Well Top, foreign keys, relvars, et al. are to reduce the duplication, but I think it somewhat reinvents/overlaps the idea enumerations and other types and constraint forms. Creating a new skinny table just for something that could be defined as a type like below can get on the nerves:

  type bitmask = (bmRead, bmWrite);

As for lispers digging it.. I don't much adore lisp too much.. I understand lisp but I'm more of a modular/procedural guy as you know. I dig the relational model but I find it a bit complex when simple tasks need to be accomplished immediately (i.e. it is verbose and time consuming to create another relvar (table) just to emulate an enumeration). That's why I created this page because a genius with unlimited time and skills COULD potentially create the perfect database according to purism.. but what about compromises? ]

Data modeling is a discipline and a craft. Like any other, it requires practice to become skilled, and dedication to acquiring expertise and knowledge about the craft, including its history and theory. Alas, like any other discipline, it attracts its share of cowboys, crackpots and incompetents. Many excellent data modelers are not geniuses, but have simply made the requisite commitment to becoming skilled.

And personal preference also plays a role. Some people can grok sparse tables easier than large table-space with skinny tables, and vise versa. Everybody's psychology and physiology are different. What bothers some eyes, brains, and hands will not bother others and vise versa. I've personally found skinny tables annoying and wanted to bop the designer.

There's something quite frightening in the notion that your selfish personal preference might play so large a role in a field that is fundamentally about meeting user requirements, especially the implicit requirements of reliability, stability, consistency and accuracy. Likewise, a cabinetmaker might have a personal preference for jagged edges, exposed splinters and spindly legs; or a cook may favour the gamey stank of slightly-off meat; or a toy designer may like intricate and easy-to-swallow small parts. That does not mean such dangerous abberations should be foisted on naive and unsuspecting users! That you find "skinny tables annoying" -- as if it reflects some generally valid principle without any apparent reference to industry best practices or user requirements -- smacks of professional incompetence. I do wonder what kind of ill-conceived rubbish your users have been forced to endure. -- DaveVoorhis

I believe you to be an over-educated idealist/purist who wishes to compile the world up-front to clean out all the bad logic and bad data so that the world runs like clock-work. But reality is different. LifeIsaBigMessyGraph whether we want it to be or not. Our abstractions are merely tools to help us *force* some artificial sense onto chaos so that we at least have a frame of reference. Abstractions are just frames of reference, a common communication platform. But TheMapIsNotTheTerritory. Politicians, Bureaucrats, Owners, and Marketers don't give a flying shit about purity of concepts, and thus make up messy graphy rules with only teasing hints of reliable patterns and logic. Pretending there is some underlying sense or simple equation or GoldHammer? under all this chaos is just lying to yourself. You are a religious zealot trying to purge the heathens. I used to be like you, but I wised up. There is no GoldenHammer. --top

"I believe you to be an over-educated idealist/purist who wishes to compile the world up-front to clean out all the bad logic and bad data so that the world runs like clock-work."

So why not instead improve the relational tools out there, which apparently many of us are trying to do on this page by brainstorming? Why continue to live with the current tools that you apparently are satisfied with? Break for progress, push for progress.
SimplifyingRdbms --top

You say that like it's a bad thing. Ideally, yes, that's exactly what I'd like to do. Sometimes, in database-driven applications, it's even achievable. Surely that is preferable to your approaches, which (apparently) endorse not attempting to clean out any bad logic and bad data? Your peculiar advocacy of, for example, "type-less" databases and whatnot are exactly that. -- DV

You demonstrate it bad on a practical level, and I promise to consider it. Is that asking too much? Note that I am not the only "lite type fan" on this Wiki. The others have just been quiet lately. --top

I shall, but first, let us clearly distinguish "type-less" -- or more accurately, dynamically-typed -- databases from "type-less" (er, dynamically-typed) programming languages. By in large, issues with the latter can, to a certain degree, be addressed with UnitTests and the like. Whether this is a good idea or a bad idea, however, is a completely separate topic not to be dealt with here. This is strictly about so-called DynamicRelational in database contexts. Now, imagine that a table of the form pay_rates (employee_id_pk_fk, hourly_salary) exists in our brand-new DynamicRelational payroll database, freshly purchased from TOP Software Inc. We presume the attributes of our table are "type-less", in keeping with the tenets of DynamicRelational. All afternoon, we enter the hourly salaries for the 10,000 employees of our company and prepare to generate paycheques overnight. Lo and behold, when the employees arrive the next morning to pick up their cheques, it turns out Employee #1324's hourly_salary was mis-keyed as 8,98 instead of 8.98, paycheque generation failed with a java.lang.NumberFormatException, there's nary a cheque to be seen, and our factory is burned down in the ensuing employee riot. This could have been trivially prevented had the database supported static types, because the mis-keying would have been caught on entry. QED. (Epilogue: The lazy, negligent developer who thought to save a few minutes development effort by using "DynamicRelational" is hoisted up the flagpole as a warning to anyone who would dare consider fewer database constraints "better" than more.) -- DV
I've already said that payroll and medical care are probably not the appropriate use of DynamicRelational. PickTheRightToolForTheJob. And also note that DR allows one to add validation/types to columns as needed. You make it sound like an all-or-nothing argument. --top
What domains do you consider so trivial as to risk invalid data? Alternatively, what applications of database systems do you see as being so short-lived that the extra time it takes to identify or define a column type is excessive? Furthermore, in a system that permits "validation/types as needed", how does a "type-less" column differ from a garden-variety STRING or TEXT type? -- DV
"Risk invalid" is rather presumptuous of you. Most of my errors in such langs are in fact not even type-related. Anyhow, a good use of such languages is in interactive reporting systems (drill-down, QueryByExample, etc.) The data already exists such that data-entry validation is no longer a key concern. A lot of shops use a BondageAndDisciplineLanguage like Java for the data entry part; but, scriptish languages for intranet reporting from data-warehouse-like repositories and views. --top
How is "risk invalid" presumptuous? Do you not believe my scenario is realistic? And, again, this is not about languages but databases, in particular your notion of DynamicRelational. Your references to interactive reporting systems, already-existing data, use of Java, and intranet reporting are all diverting from the topic clearly elucidated above, which is the use of so-called DynamicRelational. Please do not try to change the topic, and answer my questions. -- DV

Top, there are a lot of fans in this world.. the issue is.. 90 percent of the world doesn't know what they are doing and is lacking in genius skill.. hence the title of this page. Doing programming correctly is hard. Very hard. It's hard. That's why people choose sloppy solutions like what we are offered with mysql and sqlite (incomplete products that were created in a rush because we just don't have time for the perfect soltion). Using it a better or safer product is really hard when there aren't many to purchase or download, especially if it is not a mass product known by the industry. This is the problem with TheWest. Just look at Microsoft and see all the fans that support that funny company. Then look at all the haters. Or the fans that support the bizarre perl language, and all the haters. Millions of people are fans. Doesn't mean it is necessarily more right, just because of fanboyism that is there.

I don't think it is that simple. A lot of the less technically-gifted developers often have social skills that are very helpful in understanding and related to the user and managers. And also, globalism is shifting "static" services overseas, leaving rapidly-changing (dynamic) services to us. Under this, nimbleness is more important than purity. Change is our comparative advantage, not technology per se. Thus, there is a bigger need for flexible and cheap techniques. --top

And you don't see the need for agility being met by better or more fluid administration/development tools? It can only be met by reducing constraints? -- DV

[Cheap techniques, cheap programmers, cheap languages, cheap solutions like PHP and MySQL 3.0. That is what this world is coming to Top. It is sad. What we need to do, on the contrary, is smarten these people up who are doing cheap product programming with cheap laws and cheap fantasies]

And the good ones program circles around red-tape loving anal types bloaters. Up yours!

The weak/dynamic/loose typing database and application programmers write their own anal checks using something called a function, Top. A function checks to see whether things are valid. Sometimes you forget, and it should be automated.. by your system. That saves you time. It takes a lot of time for people to figure this out, that the amount of time they saved by not declaring things is wasted in writing the checks in their code which are more error prone when you are doing it.. versus a common tested automated system written by other smarter people. They are smarter than you and I (the relational database engineers and the type system engineers). Worse, many times these run time checks that you write (bug potential hazards) with your own code aren't reused across several apps and you end up reinventing the checks each time.

The entire purpose of placing integrity into the database and into the type system is so that you don't spend time writing these checks yourself in each application with functions like "if isNum(data) then error('this database column only accepts numbers') which would have been checked for you automatically by the DB if you'd enforced it with an integer column. People are too lazy, by nature, to write these checks for every data item.. and it bloats up your code. Irony: bloating your code is what you claimed a type system and a database with integrity checks would do. What happens is you end up bloating your application with the checks yourself, or worse yet, and most often, you simply don't write any checks at all or forget since you are human.

These are not new arguments. I've seen them on this wiki before. I roughly compare the two approaches to nimble small companies and large organizations; the large ones are usually more beurocratic than the small companies. They each have their place. Transporting the large-co red-tape to the smaller companies would make the small company less competitive and vise versa would make the large company trip over itself. Neither approach is a free lunch over the other; they have involved tradeoffs and the best mix depends on the situation. --top

Furthermore, if you forget to filter data and someone injects a malicious string into your database (or even binary data) - this won't be protected in an integer column without integrity checks at the DB level. Since your database allowed strings (or blobs) to be input into an integer field (making zero sense), this can be a huge security hazard where people can take over your entire system or inject odd things into your websites. Worse that could happen with an integrity checking database is "not a valid integer, bye bye now". As a lot of database programming these days involves some form of internet, this means that in an insecure internet world.. integrity checks (automated ones especially) are always beneficial in the long run for programmers. If your application is being used by one single person as a throw away app - then go ahead and use a text file. Why bother with a database at all? --Lars

SqlSecurity? is generally orthogonal to this issue.
- Security is NEVER orthogonal. Security, Optimization, Correctness, and Reflection are all 'cross cutting' concerns. It is impossible to correctly consider them in a manner orthogonal to the rest of the system. Instead, they need to be utterly invasive of every single component of the system (including each other) if they are to work at all. This is much of the reason that SecurityIsHard.
- {Restrictions and knowledge of types helps security. For example we use a getCgiVarAsInt style function (getAsInt in later versions) in my web framework, when we want to retrieve an integer from a URL variable. Then I may plop this integer into a database. Whoa, the database also accepts an integer. Perfect, matching up. But wait... if I had just done a getCgiVarNoTyping and plopped it into a "no types" database, it could contain injections of all sorts. The mindset of knowing that this is an integer coming in and should stay as one is more secure than the mindset that this is just a string or blob, all data is a blob. The mindset of let's be lazy, get this work done with a string now isn't as secure as knowing precisely what is happening with the type. If someone finds a flaw in your getAsInt function, you've got another protection.. the database itself which only accepts an integer. Doing all these checks yourself with unit tests and your own functions is great.. but humans don't have time and let things slip through even if they try hard. Furthermore, if you accidentally happen upon a flaw in your application code, the database should help warn you. But not with one that doesn't know anything about what you are putting in the table.}

Not right now it isn't possible to easily use or create a good database according to the weenies, and this is the problem. See FabianPascal and DbDebunk. It seems, 99 percent of the world is doing it wrong.. hence it must require a genius to get it right. But is that a good thing? No. We need to improve the tools. First off.. our current tools are such crap that anyone can build a crappy database easily. Only a genius can work around the crappiness and build non crappiness out of a crappy product. Understand this page was not intended to insult, but to question.. why our current products take so much effort to give us a good strong integral model that is not prone to duplication/unnormalization/badness, etc. Also understand that I am a DevilsAdvocate. --Lars
FabianPascal may be one of the people who use chicken-little-ism to scare business to THEIR door. It's common in the OOP community. In other words, they are thinking with their wallet. That is why science is important: public metrics and public studies. --top

Perhaps "for a genius" means overengineered for job security for the overengineer. I thus coin a new term: "overengineer". --top

And that's why instead of bitching, I discuss such simpler solutions on this page as the enumeration.. which no one replies to since more people are interested in flaming instead of actually talking about implementing solutions to solve the problem. --Lars

Seems almost like a ConstantTable with some direct DB support.

[Indeed. Perhaps it's just accepted as a good idea without comment. I've seen relational systems that explicitly support enumerated types (LEAP is one, IIRC), though the usual approach is to use a ConstantTable to contain the enumerable items and establish foreign key relationships with it. This permits changing the enumerated "type" without changing the schema, as well as allowing it to trivially participate in queries.] -- DV
- Changing a constant? I understand the concept of constant table but it bothers me that one can change it. It's really a limited private variable.
- [Constancy in such cases is usually enforced via security, i.e., GRANT.] -- DV

A good many of the problems are caused by people who take authoritative evidence as Gospel without understanding the implications and variety of tradeoffs and the difference between subjective preferences and objective evidence. --top

Why don't you continue to use the products you are using? Do you have a purpose here? What exactly are you trying to stand up here and protect? Do you have a product to sell us or a product that you use that you wish us to know about which isn't relational, or something? i.e. what do you plan to gain or what do you plan to protect by being here and saying statements like the above? Is it that nothing needs to be improved in the industry, people just need to get more familiar with the products like ones you are using? In that case, once again, if you are happy, why do you even bother on this page? This page was made to discuss options on how to improve relational databases to make them easier to use.. as it appears only arrogant geniuses can currently use relational databases... otherwise people like Fabian wouldn't be so angry. The very idea that people are yapping instead of discussing solutions to the problem shows who wants to make progress and who wants to just continue being the way they are.

PageAnchor: Finding-Good-Choppers

An anecdote: I'm currently dealing with a system that has lots of skinny over-defined tables. One can tell that they got their abstractions wrong either due to not seeing the whole picture or via changes that came later. One cannot tell what caused the failure because the original authors are long gone. Everybody complains that its hard to get data, not just me. It is one reason why they are looking to replace it in a few years.

You could easily create views to denormalize the database... is this harder because this is case of DbasGoneBad and the Dba doesn't allow you to create views?
Part of the problem is that there are not enough DBA's. The system is also overloaded such that views needed to get what we need would bring it to its knees. I'm working on a data warehouse which factors the info into fewer tables. And it does make use of things similar to AttributeTables to acheive this.
Funny I feel that AttributeTables are the ultimate skinny table, no more adding fields to any table, you just use the all powerful AttributeTables (AttributeTables are also, in my experience, bad for performace, because you have to kind of transpose them into 'normal tables').
Actually it's kind of a hybrid, but it would take too long to describe it here. And again, I am not against all skinny tables. "It depends". WaterbedTheory. --top

Perhaps if there were an army of database programmers around to help get stuff out and describe where everything is, it may be easier, but there are not. We are stuck with a big ball of tables. Does skinny tabling assume the army of experts will stay around?

I have seen badly designed databases with a few big tables, those big tables generally map really well with the reports that are currently being printed, but they become a true nightmare if you need to add new functionality to the system (you start having lots of integrity problems when inserting or updating data because of the typical redundancy found in badly designed databases with a few big tables)
I am not claiming there are not bad wide-table designs.

Wider tables are at least easier to study when trying to figure out what is where because you don't have to hop from table to table. I'll take a wide-table mess over a skinny-table mess any day.

But wide tables cause your screen and eyes to be damaged from scrolling trauma. A human can only focus on so much screenspace at a time. With more tables that are normalized, one can see what data is related to what - in more modules (tables).
Why is it that programmers have no problem creating applications with separate lists, arrays, objects - but as soon as when it comes to databases all of a sudden it must be one huge large global mudball? Think about tables as not just tables, but a method of organizing data. Even Excel spreadsheet users try and keep data within one page.. scrolling sideways isn't fun. They end up creating more workbooks and sheets that link to each other... and they are naturally trying to normalize their excel sheets without realizing it is time to move to a real database.

It depends, if you system is only for OLAP (better to have big tables) or OLTP (better to have a very normalized database)

Well, I recommend to stop thinking in tables. The table is just a view. OLAP can be done with normalized tables by simply viewing all the tables with the values plunked in place. Methinks that people are confusing the view with the model, again. If there arne't tools out there that make normalized tables easy to view in different ways, then once again that proves the entire point of this wiki pageL to improve databases (which includes the tools we use to view them). The idea that one should not normalize his database because it offers a better wide view is naive. This wide view can be represented with a normalized database. Claiming that it is better to have big tables should make the dbdebunk quote of the month.

Maybe there is a "proper way" to do skinny tables right, but this is clearly not one of them. It appears the problem is that its hard to create divisions (classifications) of things that fit the domain well and stay fitting the domain. It is a form of LimitsOfHierarchies. Skinny tables create (at least) a 3-level hierarchy:

  dataabase -> table -> sub-table

Domain classification hierarchies are difficult to get right, perhaps impossible if the domain is not really tree-shaped. True, it's not really a hierarchy because one can have cycles via use of foreign keys, but the problems of shoehorning a domain that you do not fully understand or will change into these three items appears to be an artificial endeavor. Wider tables make fewer grouping assumptions, and thus flex more. If you want specific narrow views, then make them, but don't gum up the originals to get it.

Sigh. More flexibility is available when normalization has been done. Being able to query what available cities there are in a state, versus not being able to, is an example. Apparently you do not understand normalization. Any thin table can be grouped with other tables by plunking the data into that other table through noramlization. This provides much more flexibility than an unnormalized wide table. I suggest you do some reading on normalization and how it offers more ability to (sanely) query what you are looking for (which is ultimately what a database is for, and hence more flexiblity when you can query tables that are related)

I believe in PutClassificationsIntoMetaData rather than hard-wiring them up front because almost nobody gets them right up-front, and even if you are the one genius who does get them right, nobody can really tell that up front without tons of onsite visits to prior installations, and even that assumes the list is has not filtered out the misses. People usually don't put their failures on a resume. Most of the time such an expensive background check is not worth it.

Normalizing a table helps so that if things change, you have options. You do not understand normalization based on your above comments. Getting it right in the first place is exactly a problem in wide tables.. because if you have old orphaned fields that don't map to anything, then you can't sanely manage your data since you didn't get it right. Normalization helps you in the future when you didn't get it perfectly right with your wide table.. you can add more fields to your thin table and they affect your other tables and offer you ways to sanely upgrade your database. A wide table is just an excel spreadsheet.

I am not dismissing the possibility that there are good choppers out there, but I know of no sure way to find them. And, all of the skinny-table installations I've encountered didn't find them (or all their documentation was burned such that we never knew why they sliced and diced the way they did, and so it appears somewhat random). BetterOnlyIfDoneRight syndrome?

Could you please refrain from using white trash talk, such as chopping? Normalizing a database, if you understand normalization, is not called chopping. People do not slice and dice things for random reasons. Please study normalization before making ludicrous comments like the above. Even ConstantTable is an example of what whoever here calls chopping.
[Indeed. The use of ad-hoc vernacular is often an attempt to sound "cool", like a free-wheeling industry insider, when in fact it comes across as a thinly-veiled attempt to disguise a lack of care, rigour, and understanding of the domain in question.] -- DV

Or Else

consider ActorsActionsAndArtifacts?

MarchZeroEight