When Are Databases Not Appropriate

Some people love databases (whether they are OO or relational), and some only use them if dictated by the requirements.

Databases don't necessarily have to be big and formal (NimbleDatabase). Perhaps a wrist-watch is not the right place, but other than tiny embedded stuff, where and why are they not appropriate?

Any problem that doesn't need any of:
- fast ad hoc queries as part of its solution.
- AtomicConsistentIsolatedDurable semantics, or other transaction semantics
- Long-term maintenance of persistent data that changes over time.

DatabaseDefinition lists about 8 features of databases. Queries are only one of those. Besides, almost any non-trivial application will probably eventually need at least some "queries" (depending on how you define it).

I don't intend to be glib here - you don't need a relational database if you don't need these 8 features. -- SteveConover

Of all of those 8 features, fast ad hoc queries are the one thing relational databases provide better than any other tool or set of tools that I know of. If you need the other features there are alternatives worth considering. If you need fast ad hoc queries you need a relational database. -- EricHodges

(I manage to do fast ad hoc queries, every day, on a massive heap of unstructured data using a tool known as "Google". Although Google makes use of indexes, I have trouble thinking of it (or the internet) as a relational database. -- DaveWhipp.)

Somewhere on this wiki is a discussion about text search and relational. If I come across it again, I will link to it. By the way, one cannot tell how google is implemented by looking at the interface alone. But I would note that search engines that offer query syntax such as "wordX or wordY but not wordZ" appear to barrow or echo relational algebra in many ways.

But they are better integrated if you get them through a database or database-like tool. You cannot just plug in multi-user contention handling like a box glued to the wall, for example. The incremental approach will be too hacky and inconsistent from implementation to implementation if you add some or all 8 of those features independently. It would be like slowly evolving assembler language to a higher-level language. YouAintGonnaNeedIt fails here. (Perhaps this belongs under YagniAndDatabases or BenefitsOfIntegration? or IntegrationConflictsWithYagni?. There is a key concept here that I am frankly having a hard time trying to articulate.)

Many of the other features can be provided by Web services, messaging systems, etc. It's fast ad hoc queries that demand relational databases.

I am skeptical that the others can be independently tacked on and do as well as a database. Databases are an integrated state management system, and the integration is effective. It is like building in central air conditioning and heating up front rather than tacking it onto each room as needed. Things like Web services are flat and static for the most part. You have to contact or understand the language of the other side to get new requirements from them. Databases are one-stop-shopping for typical and common state management needs. For example, how do you do joins from two different Web services? It is much more work on the developer's part than if it came from a database. (I think there is a similar discussion of this around somewhere.)

Web services are no more flat or static than RDBMSs. The client must understand the language of the server to use them, but the same is true for RDBMSs as well. Joins are part of ad hoc queries, which are the single feature best provided by RDBMSs. And even then many RDBMSs can't do joins with other RDBMSs.

But databases are usually at a larger scale than services. It is easier to deal with say two databases that may speak different SQL dielects than 50 little services that reinvent operations like set, get, delete, find, sort, filter, cross-reference, lock, save, backup, etc. with different names and patterns. Each invents their own little query language and DBMS of sorts. Regarding joins, with 2 systems a join set is more likely to be on the same database than with 50 services.

Databases and Web services come in all sizes. Comparing 2 of one and 50 of the other is unfair. And my objection was about joins across RDBMSs.

Part of this is about standardization. It is easier to consolidate databases if they speak a similar protocol. Each of your services may create vastly different protocols for doing rather similar things. It is poor InterfaceFactoring. And, if they did all do it the same, then why not consolidate rather than replicate so that you can do things like data/state backups all in one spot and make joins easier? You know what? You then have just created a database. All roads lead to Rome. Databases are simply OnceAndOnlyOnce applied to commonly found state-management patterns and properties.

YAGNI.

I reject YagNi here on the grounds of integration is harder under an incremental approach. Perhaps this belongs at YagniAndDatabases.

If integration is inevitable, then you are going to need it. In some domains, almost every piece of software will need to be integrated with every other piece of software. The nature of that integration varies, though. Some environments need messaging, some need databases, some need both. There are domains where integration is an unlikely requirement. There are others where integration is well understood and modularized to the point that adding database connectivity can be delayed until it is needed without additional cost.

I did not mean integration between applications as much as integration of typical state management operations (set, get, delete, find, sort, filter, lock, save, backup, etc), although sharing data is part of it. If you use a database or database protocol, then you kind of get a "bulk discount" for any further items that you need. YagNi is anti-bulk-discount in some ways.

I have no idea what you mean by "integration of typical state management operations". To me, typical state management operations take place in memory, in process, in thread. Only atypical state management crosses those boundaries, integrated via various methods.

Why is your view of it tied to specific hardware arrangments?

My view isn't tied to specific hardware arrangements. No matter how you arrange hardware, most state management operations take place in memory, in process and in thread.

Well, unless you are forced to deal with it at that level, then why bring it up? If the DB hides issues of RAM and disk and semaphores and process forks from a developer, isn't that a nice thing?

You brought it up. The database can do certain tasks well, but it can't replace the application being developed (otherwise we wouldn't be developing the application.)

Nobody that I know of suggested that the entire application or project be in the database's native language.

If the database can't replace the application, how can the database hide the issues you list above?

It makes it so that the application developer can care less about them. They are not 100 percent invisible, but more so than if you wrote them from scratch. And, more consistent from app-to-app and shop-to-shop.

State management operations in the application are 100% visible. Using a database doesn't mean we can care less about them.

Calling the interfaces, yes, but not the implementation.

Applications that are clients of databases don't have to care about implementation of state management? Am I understanding your claim correctly?

At least not the low level stuff.

You've lost me. Being a database client doesn't remove the burden of state management from an application.

It reduces it. I don't have to directly implement indexes, joins, semaphores, disk saving, etc. I see a fair number of OO applications that implement lists of pointers to get one-to-many and many-to-many relationships.

Those relationships are in the application, not in the database.

They are?

Yes. Any lists of pointers or references implemented in OO applications are in applications.

But they don't have to be in the application. You could let the database take care of that so that you don't have to impliment a list of pointers.

You're assuming there is a database and that a storing this data in a database makes cost and performance sense.

You can't move all local variables out of process, perhaps across a network, into an RDBMS. Performance still matters in some domains. Applications that are clients of databases still have to manage their own state.

I did not say "all". This is not an all-or-nothing issue.

You argue above that by becoming the client of a database then the developer will care less about typical state management operations because the database will hide them. As long as the application being developed has internal state, the developer has to care about typical state management operations. The only way I can see how using a database would change that is if the application became stateless and all state was managed in the database. I've never seen that. Even simple database clients have internal state that must be managed.

Again, it is a matter of degree. The portion of state management that is on the DBMS versus that which is done in the client will be different depending on how different actions and services are partitioned. For example, to perform a cross-reference ("join"), one could issue a query to tell the database to perform the join, or one could write a double loop like this at the client:

  loadList(listA,"A.data")
  loadList(listB,"B.data")
  for itemA = each item in listA
for itemB = each item in listB
  if itemB.x = itemA.x
    print "I found a match!"
  end if
end for
  end for

You think a join won't do the same thing?

I don't know because as an applications developer, I don't have to know. That is my point. It is reuse of existing implementations and reuse of existing protocols related to state management. That is the bottom line. It is analogous to using a higher-level language instead of assembler or bytecodes. Does YagNi dictate programming in bytecodes?

Reuse doesn't mean you don't have to know. It's important to know so you can pick the right place for the code to execute (except in those rare instances where performance and cost are not important.)

There are various ways to tune DBMS for performance improvement. Often it is by the DBA, but it could also be via "hint" clauses in queries that suggest implementation routes to the database system. (For example, to use a sequantial search instead of indexes.) Newer systems claim to be self-tuning by monitoring usage patterns.

So why do you reject YAGNI for a database above? You claimed it was because "integration of typical state management operations (set, get, delete, find, sort, filter, lock, save, backup, etc)" is "harder under an incremental approach." If this doesn't apply to state management within an application, why are we going to need a database?

Where does it not apply?

Within an application.

I am sorry, I am not sure what you mean.

RECAP: You rejected YAGNI above because you said integration was harder under an incremental approach. By "integration" you said you meant "integration of typical state management operations (set, get, delete, find, sort, filter, lock, save, backup, etc)". I argued that typical state management operations take place inside an application (the one being developed with or without an OO programming language). You clarified that using a database doesn't alleviate this type of integration of typical state management operations inside the application. So I'm confused why you reject YAGNI with respect to the additional features of a database. Applications must manage their own internal state regardless of whether they are database clients.

I am not sure where you got: "You clarified that using a database doesn't alleviate this type of integration of typical state management operations inside the application."

You clarified it when I said:

: Being a database client doesn't remove the burden of state management from an application.

and you replied:

: It reduces it.

In my experience, being a database client can reduce an application's internal state management slightly. I've never seen an application with no internal state to manage. If it has a database connection that is state it must manage.

I never said it was all-or-nothing.

Understood.

It makes state/attribute management more consistent and standardized, so that state/attribute management tools can focus on doing just that job well. In the large, a database is DivideAndConquer, specialization, and standardization of common patterns/features.

Due to internally-controlled garbage collection, they may not be appropriate where precise timing is important. But, this is true of many higher-level languages and tools. Their transactional nature may also result in having to use polling, which is inefficient at high turnaround rates. However, triggers could perhaps be used for a more direct notification system.

resource optimization?

This is based on a usenet discussion:

These are generally called "resource optimization problems" if I am not mistaking. These are things like shortest route through multiple cities and most packages per truck. I don't know if relational is inharently "bad" at those because I have not done enough to compare.

But one thing you should consider is that the programming for such is often done by programmers trained in C and Fortran, not RDBMS. Thus, their solutions will tend *not* to take advantage of RDBMS features, but be tilted toward imperative-only solutions.

Nor do I propose that the entire process be done in SQL. Solutions can be a healthy mix of SQL and imperative programming. The more experience you have with RDBMS, the more of the effort you can farm off to them. (SQL is not the ideal relational language, I would note.)

In short, just because *you* have not found a way to use RDBMS effectively with resource optimization problems does not mean it cannot be done. The bottleneck could perhaps be you, not relational.

--top

RelationalDatabases use an essentially "dumb" search algorithm to find solutions. If that's the best algorithm you've got, then the power of the RelationalCalculus is likely to be a win. If you have an orders-of-magnitude better algorithm, then using that algorithm may be the critical success factor for your application (think expedia-type flight routes etc.). Some such algorithms don't particularly benefit from the RelationalCalculus, I'm sure, while for others it may remain a very useful abstraction. See e.g. ConceptsTechniquesAndModelsOfComputerProgramming chapter 9.

My experience has been that often (for typical business applications) "dumb search" is the only algorithm you've got (or need) and that the RelationalDatabases model is a good fit.

Link regarding generic versus special-purpose databases: http://developers.slashdot.org/article.pl?sid=07/01/09/2337257

CategoryDatabase