Hidden Database Syndrome

As tools become more complex, they are essentially (directly or indirectly) creating databases to hold state information. Examples include:

If such tools come to grips with the fact that they are essentially creating databases, then working with them, debugging them, learning them, and extending them could be a lot simpler.

A language that exposed its internal structures could allow meta-programming without having to add new syntax features to provide them. The language could be simpler because "fancy stuff" is done by manipulating the database rather than with confusing syntax.


It is not so much being "hidden" (locked out), but that it is hard to work with because the built-in tools are meant to mostly deal with textual syntax or the like instead of the "database" itself. If the designers realize they are making a complex database (or at least a complex/compound data structure), it would possibly change their approach and encourage them to treat it more like a database than a bunch of syntax and commands that alter the database.

Then again, some people seem perfectly comfortable dealing with such structures using behavioral syntax. I personally do not understand why, but realize people think vastly different from each other. I myself would rather deal with complex constructs with a database browser and query languages or something similar instead of as syntax. It still might be the case that some people simply have never worked with good DB or data-structure browsers, and are thus too used to syntax-centric approaches to change. To me it is a guessing game. You write some Lisp code for example, and that code is turned into an internal hierarchical database that changes over time. One has to mentally translate from the syntax to the current state of the internal database. I would rather cut out the middle man and deal with the structure itself.

I also think that people would realize that relational is superior to tree-based structures if they could compare side-to-side for a while. Trees perhaps make for better textual syntax, but make for crappy databases in my opinion. -- top

[Let me just reflect on this... Anything which stores organized data (seems to be the implicit definition of a database) is a database, including things which use trees. But they shouldn't use trees, because trees make bad databases. So everything should be procedural?!? Maybe you need to take a step back and look at your assumptions. The relational model is not the be all and end all of data representation.]

Well, trees and databases are not necessarily mutually-exclusive (SyntaxFollowsSemantics), and perhaps we should separate the concept of treating stuff as a database and which kind of database it is, be it hierarchical, relational, or something else. I find trees fine for smaller things, such as individual expressions, but on a bigger scale they become arbitrary and messy. If people would get used to more of a database view into stuff, I think they would tend to shift away from trees because they are no longer bound to just a textual view. I feel tree-centric syntax is a side-effect of being to closely bound to text as the medium.


A related issue is whether such databases should be a NavigationalDatabase or RelationalDatabase. Some suggest that navigational is faster for smaller-scale databases. However, this may be simply because nobody has tuned a relational database for such use. (Or maybe nobody has invented a SufficientlySmartDatabase)

[Well, clearly NavigationalDatabases are the best, since the best database known to exist is the human mind, which is a node based network.]

I hope that was meant in humor.

[Of course it was!]

Whew. Thanks for confirming... it's hard to be sure these days on wiki.

If it was relational, we would have cracked it by now. -- top

Actually it is an interesting thing to think about with regards to NavigationalDatabase being "the best". Certainly one would think that it was meant in humor, however let's take it a bit more seriously and have a discussion. Why is the brain so intelligent if it has such a crappy database? Maybe it isn't crappy. Maybe it is the RightToolForTheRightJob? in that our brain works better non relationally, since it is a different beast that doesn't require so many facts (DatabaseIsaRepresenterOfFacts?)? Or does our brain use facts? If so why wouldn't it be better relational? or is our brain a combination of relational and navigational? Since our brain is a product of evolution (undesigned) it may be a poor design (since it wasn't designed!). On the other hand, we are pretty smart, even with this crappy node based system. Or is it all that crappy? Maybe it has benefits? Does our brain use pointers? Stack? heap? Not a good analogy?


Now we're getting into the definition of a "database". Top seems to suggest that any sufficiently complex data structure is a "database", including symbol tables, call stacks. (How about a simple list or array or struct?)

Others have more stringent definitions of "database" - I tend to think that a database is something that provides some minimal persistence and/or transaction semantics (not necessarily ACID). Others go further, and exclude "toy" DBMS's from the community of databases (i.e. MicrosoftAccess isn't a real database).

Certainly call stacks don't require persistence or transaction semantics - nor are they likely to benefit from AdHocQueries. OTOH, we could certainly end all arguments with TopMind by agreeing to redefine everything as a database - hence when I program in C++, I'm doing TableOrientedProgramming because a VeeTable is - guess what - a database. (OK, it ain't relational, but pfft.)

Such a conclusion would certainly make Wiki a quieter place...

Perhaps we need another term. "Data structure" generally implies a single part of such internal structures, such as a single map or single list. We need a name for something that is an interlinked network of multiple data structures. "Attribute base"? Attribute management engine?

I think you guys are missing the point. It's not that any data structure is a database, it's that any data structure can be represented by a (relational) database model. That shouldn't be a surprise; the theory was developed specifically to that end. -- DanMuller

But that is just a TuringComplete-like truism. We still need a name for the internal structure thingy.


Well, this isn't such an absurd idea. Most of the "hidden databases" in programs could be easily mapped to a relational database structure, modulo the "identity issue" of OO. Certainly a lot of metadata would fit. And if languages had convenient, built-in APIs for database access, this would open up some interesting introspection capabilities. The implementation of such things wouldn't necessarily be the same as for bulk application data, but a uniform interface for data manipulation could be a very welcome thing. (I'm downplaying the quibble about the term "database"; certainly multi-user databases are more interesting than single-user, and persistent is more interesting than transient. What's of more immediate interest in this context is the queryable interface to the data.)

Why do you say (or imply?) that AdHocQueries only make sense in the context of a RelationalDatabase? Haven't you ever formed the union or intersection of data in memory? This stuff certainly comes up occasionally. More for framework or middle-layer code than for business app UI code, probably, but it does come up.

Who knows what it would cost you? In theory, it wouldn't have to impact parts of the program that don't use it. If you can find a way to map the data to your API without changing how its currently stored, that is. (All pretty hypothetical, of course - details will vary wildly.)

BTW, it's worth noting (for those who don't already know this) that relational databases follow exactly this pattern. The metadata describing tables and fields, and sometimes other things like constraints and indexes, are all made available in tables. This is way handy for writing development tools like reporters, automated database-to-program mapping utilities, code generators, etc.

Big whoop, databases are good at storing metadata, but that metadata still get's loaded into a program running in memory, which uses hashes, arrays and other such structures that can actually work at the speeds programs need to run. If you added all the overhead to make every structure available relationally, you'd just end up with another slow database. Programs aren't about data, they're about behavior, the data just parameterizes that behavior, but the data just isn't that interesting, it's just data, it's a glorified file, that's queryable.

"...but that metadata still get's [sic] loaded into a program running in memory, which uses hashes, arrays and other such structures that can actually work at the speeds programs need to run." So what? If you have an interface to these data structures that works fast but is similar to the interface you use for database data, you've got all the considerable power of a query interface or language, you don't have to remember different ways of manipulating data to achieve the same ends, it doesn't have to be slow, and you could presumably execute queries that involve both persistent shared data, memory-cached persistent shared data (e.g. slow-changing but frequently-used database meta-data), non-shared persistent data, and ephemeral local data.

The whole point is you can't have such an interface. Adding relational query capabilities adds far too much overhead. You think the language stack needs to be relationally complete, or that the symbol table does... you're living in a pipe dream, it'd bloat things to the point that nothing would work. You think all that flexibility just comes for free or what?

Relational access path... you can't have a part time relational system. If it's there, it has to be the only path, or you could violate it's constraints and fuck things up. And if you have to use it... it'll be too slow for system level operations like the language stack and everything tops pushing. I don't lack imagination, I just think you're glossing over the difficulties involved in building such a system, I think it's a pipe dream.

If you think it's so easy, then build it, I wish you the best of luck.

That could be a very "big whoop" for convenience and simplicity of programming. Convenient database query APIs are a prerequisite for this approach - for instance, a simple key-based lookup should be as easy to do as a hash table lookup.

[The "still gets loaded into memory" comment comes from somebody who apparently does not "get" databases. ]

No, you just don't get programming, and think that database are actually fast enough to do all that stuff.. well they aren't. When they are.. then great, it'd be nice to have those capabilities, but that's not in the near future, computer's are still too slow for such overhead at such a low level to be implemented efficiently. When a database can keep up with the speed of chasing pointers... then you can talk, until then, you're just wrong.

[RAM and Disk are immaterial. Databases can and do use cache. It is a weakness in OO that it cannot easily blur the distinction between RAM and disk. When using a database, you only create temporary, local "views" of information from the database, usually a single table (result set) or map. You don't reconstruct much of or most of the schema in RAM. That is a violation of OnceAndOnlyOnce. Too many OO programs create a big long tangled structure that mirrors the database more or less because OO'ers just don't like working with a database for some unknown reason, so they spend (waste) a lot of code translating to and from their favorite form. -- top]


Who cares which database implementation is used if the database is hidden? I would rather aim at producing software which requires no changes when I choose the database layer. I think that this very ancient argument about the right database is one of the continuing distractions which impede the progress of software improvement. Given that there are at least two well-supported opposing views to this database question, is one gonna be right one day in the future? The right database may be one of these opposing views' database, or it could be the other. More likely it will be a new view which includes all the argued preferences. So why argue about our particular views of databases when we could look for ways of specifying systems which make this argument redundant? -- PeterLynch

DatabaseVendorLock is often an exaggerated fear in my opinion. There is not enough standardization to easily swap DBMS, even among OODBMS, unless you use a very low common denominator, which unlike vendor switching, will impact the here-and-now. IndirectionIsNotFree?. Besides, databases are a high-level tool. Expecting to swap out a high-level tool is like expecting to be able to swap out OOP and replace it with FP.


See also: SyntaxFollowsSemantics, CodeAvoidance, AdvantagesOfExposingRunTimeEngine


EditText of this page (last edited February 7, 2012) or FindPage with title or text search