Databases Are More Than Just Storage

I keep hearing developers imply that databases, and RDBMS in particular, are only or primarily about just "storage" or "persistence". This is not true. Under DatabaseDefinition, you can see eight or so other features of typical databases (queries, integrity enforcement, concurrency management, etc). True, one does not have to use all or most of such features, but it is often a shame not to.

I tend to view a database as a general-purpose "attribute manager". It packages, formalizes, and optimizes commonly-needed attribute management features into a unified package/interface. It is ReUse? of collection-oriented idioms. "Storage" is just one of multiple such idioms.

Without this "packaging", you get inconsistency and end up reinventing the wheel in application code the long, hard, difficult-to-scale way. OOP applications that merely use the DB as "storage" will tend to end up down such a path. Don't fight the database, embrace it. -- top

Because often, it makes the application vastly easier to write, and it's not reinventing a db to use other facilities provided in the language or library. Databases are like sledgehammers, one often needs something lighterweight, like a ball peen, or framing hammer. Databases aren't the only things that offer queries, concurrency, indexing, etc.

No, I'm thinking of any database that forces me to leave the host language. It's just as much trouble to use Access as Sql or Oracle. But since you chose XBase, where you give the example...

    do_x             // operate on current record
    do_x all         // operate on all records
    do_x for <expression>  // operate on records matching the expression
it's a wonder you don't love Smalltalk, since it allows exactly that and much more with it's collection protocol.

    aCol do: [:x | ].
    aCol 
        select: [:x | //expression]
        do:[:x | ].

It is true that one can use databases for only storage, but that is like using a car to only store gasoline.

(Indeed, if you're only using a database for "storage," you might as well follow the cheesy practice of using Excel as a (reallly bad) database. This happens in companies all the time more than you can imagine!)

Actually, RDBMS blur the distinction between disk and RAM (from app developer's standpoint), and that is one of their nice abstraction abilities. OO has not found a way to do this without either tying information to just one language, or breaking encapsulation purism.

If this were true, PlanNine from BellLabs could replace easily every RDBMS engine out there.

Of course, most object developers have no problem with being tied to a single language.

Some of us like to mix and match languages. In many shops it is a necessity because languages change all the time, and because they want different languages for different purposes. For example, a scriptish language for quick competitive turn-around and a strict-typing language for "delicate" operations such as payroll. You seem to be seeking a OneSizeFitsAll language or GodLanguage. That is unrealistic IMO. And even if it was possible, shops are not going to convert overnight, so out of practicality we cannot write software as if it is happening. Data has consistently outlasted languages anyhow.


Such developers should be asked why then they are not using simple file I/O and use just files to do the "storage". I mean, that would be DoTheSimplestThingThatCouldPossiblyWork (DTSTTCPW) and true to the spirit of YouArentGonnaNeedIt.

Using an existing RDBMS may be DTSTTCPW, rather than rolling one's own. That seems to be assuming that files are easier to use than databases. Maybe BigIron RDBMS are a pain, but I grew up with NimbleDatabases which in many ways were simpler than files. For example, you didn't have to worry about write order because you could search/index stuff in any order (random access). If you already have a RDBMS for other purposes, or if DB engines come with the product, then adding yet another table is minor effort, and handle more complex structures and operations than files.

The point I was making is that one should really ask those developers why are they not rolling their own file based storage. Propose a simple periodical dump to disk. Or propose a really big battery/UPS attached to RAM or flash RAM and an application that never quits. Surely this is the simplest storage possible. They will of course disagree with such a simplistic solution (and rightly so). Then point out that additional features they need are beyond "just storage" and are a part of "queries, integrity enforcement, concurrency management, etc". BTW: one "just storage" solution is ThePrevayler.

Files don't easily support random access and concurrency issues. When you need concurrency and random access, files are often not DTSTTCPW. One process might get stuck, not releasing it's file lock, choking the whole shebang. RDBMS offer automatic lock time-outs, all-or-nothing transactions, etc. to minimize these kinds of things. Also, perhaps it is not clear who "those developers" are and we may be misunderstanding each other here.


Why?

Please clarify

I am working from the assumption that the primary purpose of a database is to store data for subsequent retrieval. If I am wrong and the main purpose of a database is to do more than offer storage, then it looks like the database could at least be accused of CartHorseInversion. If, however, I am right, then functionality beyond data storage and retrieval is possibly a desirable adjunct for data-centric applications that need some processing before output to assist a human or another application to digest the output. If the database is being used solely to provide a persistence layer for a user-centric application, then using the database+ functionality just introduces unnecessary complexity in structure and probably tools and language choices. Which is why I asked why DatabasesAreMoreThanJustStorage.

Well, they could be only about "storage", but the "database philosophy" is that there are many "duties" that fit nicely together in or around the process of "state management". For lack of a better analogy at the moment, it would be like a hiring a "paper towel replacer" instead of a janitor for a medium or large building. Other restroom maintenance duties would be handled by miscellaneous persons or local departments. It makes more sense to roll them up together into "janitor". Ask yourself this: if "storage" is all that you need, why not just use the file system? It also provides "storage". -- top

I would if it were a TransactionalFileSystem with a high degree of searchability.

Sounds very nearly like a database in file clothing. (A file system is a kind of database, by the way.)

I find it interesting that the file system is actually a database, I have been labouring for years under the misapprehension that a database is an application that is built for the storage and retrieval of data, that depends on a file system to store itself and the data it is responsible for handling.

Some database systems use the existing file system, some bypass it and do their own disk management.

Obviously I'll check DatabaseDefinition to see if the state-of-the-art has moved on. Perhaps if the file system was actually a database rather than 'a kind of database' then we could remove the whole concept of databases (NoMoreDatabases) from our humdrum, quotidian, UserCenteredDesign and application building.

Actually it's the opposite. You would remove the file system and replace it with a database. After all, what are files but BLOBs with a primary key (filename/path)? This would allow any 'view' of the 'file system' instead of the pseudo-tree-with-shortcuts mess. And how about the non-standard NTFS attributes like title/comments/etc. Those should all be database fields and query-able. Then I can do "select * from file_system..." or updates or backups etc. The database cache would replace the file system cache and indexing. -- AnonymousDonor

Maybe you could enlighten us about other things we have overlooked that are actually databases in disguise? The application I wrote this morning could arguably be a database of sorts, but it would be a pretty spurious argument and a pretty lame database, because although using data storage and retrieval, curiously it is an application that does not offer these as its main purpose.

Databaseness may not be Boolean. It is possible to end up ReinventingTheDatabaseInApplication, for good or bad. Some or all of the features found in databases may end up being built from scratch in the application.

I think part of the problem is that many are (understandably) frightened by the BigIron view of databases. The setup and beurocracy is often so great that rolling-your-own often seems easier. Hopefully NimbleDatabase thinking will return such that it is not a Boolean choice between BigIron and roll-from-scratch. I miss having a nimble DB option. (If a need outgrows the nimble portion, you migrate it to the BigIron DB, hopefully with little or no code rewrite.) -- top


Dammit, If I see one more lump of code looping though a RecordSet to do something that should have been done in SQL ...

See: RightToolForTheJob?

The flip side is MisuseOfSql. I've seen it from people who ONLY know SQL. They do amazing stuff with a clunky language, but clunk is still clunk.


I think of databases as communications systems. Like a large whiteboard with multiple users posting and retrieving information. A modern database engine is required for any reasonable sized application, i.e., more than one user. The database insures that all data posted is complete, in short ACID compliant.

You might think of it that way, but it's simply not true, lots of system chug along just fine without that, while having many users. Object databases provide ACID as well.

The DB engine makes copies of the information so that if the whitboard gets accidentally erased it can be restored to a previous state (durability, recovery). The database engine must schedule the changes in the proper sequence (racing conditions), must [??] users ensure they do not block each other (deadlock), make sure all changes are in the correct format (constraints, types, domains), that 2 users do not interfere with each other's changes (independence) and if a user has a heart attack in the middle of a change, the DB engine ensures any incomplete writes are removed (atomic operations). All to ensure that the past can communicate with the future in a reliable manner. So if you wish to recreate all of that in code, good luck. In my universe, data loss is unacceptable.

Object databases provide all that as well, one doesn't need to code up ACID storage, and relational db's aren't the only game in town that provide it. Databases don't have to be more than just storage, if storage is all you need.


You might think of it that way, but it's simply not true, lots of system chug along just fine without that, while having many users. Object databases provide ACID as well.

Define object databases. Give examples of multi-user environments where you either do not have to use a reasonable database engine or do not have to build your own concurrency handling engine to avoid data corruption issues.

Object databases provide all that as well, one doesn't need to code up ACID storage, and relational db's aren't the only game in town that provide it. Databases don't have to be more than just storage, if storage is all you need.

Where exactly was the term RelationalDatabase used? Storage of what? Garbled garbage? What guarantee do I have that what I retrieve is the same as what I wrote?

Me thinks you need to reread the opening paragraph to this page, it's all about relational databases, this is a Top topic afterall.

From the opening I keep hearing developers imply that databases, and RDBMS in particular, are only or primarily about just "storage" or "persistence". The topic is broader than RDBMS, with RDBMS just being an example. I think you are the one misreading the topic. The thesis is "databases are for more than just simple storage", and I agree. My point being is that if you have a multi-user environment, then either 1) you need to ReinventTheWheel and provide much of the same functionality as a modern DB engine (which is error prone and a waste of time) -or- 2) use some sort of database engine which can ensure you do not have users corrupting each other's data, deals well with race conditions etc.

Many times using a database engine, yes even one of those EvilRDBMS, allows you to DoTheSimplestThingThatCouldPossiblyWork. Why ReinventTheWheel?

It is not about reinventing the wheel. Look, is SQLite reinventing the wheel? why do people use SQLite? I'm not the main contributor to this page, by my personal view is that we need something just like SQLite but even lighter, and much better than SQLite, and more relational. SQLite is not a proper database. But SQLite is useful, even so.

So what we need is another type of product, like SQLite, but proper relational, and even more integrated into the host language. SQLite still makes you mix your SQL strings with your host language, which can be ugly and tiring. Something like Pascal/R, RelProject,or TutorialDee looks close, but maybe the syntax can be improved and it can be ported to other languages like C++, PHP, Delphi. One of the problems is integrating it into existing FuBar languages that we have to deal with today. For example, the CeeLanguage is closed and finished, it's not like you can extend it (maybe with the pre processor).

I am waiting for you to respond to my request for examples.


See also: YagniAndDatabases, ProgrammingWithoutRamDiskDichotomy, SchemaDesignIsModeling, DatabaseIsRepresenterOfFacts


EditText of this page (last edited November 26, 2014) or FindPage with title or text search