Using Database Up Front Considered Harmful

KentBeck posted this on the XpMailingList today, which articulates what I think many of us using the YagNi argument have been trying to say on DbasGoneBad, WhenAreDatabasesNotAppropriate, InvasionOfTheRelationalWeenies, etc:

"You have the cost of design carry whether you anticipated correctly or not. To illustrate, suppose some day you will need a relational database. You correctly anticipate that, and put in the appropriate factories and a mapping layer to make sliding in an RDB easy. However, every time you change something, instead of having one object to change, you have a bunch. Instead of having one package, you have a bunch. Instead of having one class, you have a bunch. All of that stuff costs you *even if you are right*."

-SteveConover

But then he doesn't know to use relational databases properly. He uses databases as persistence stores - therefore factories, mapping layers and all the heavy stuff. So his reasoning is kind of valid only if you go through that route. If you don't fall in that trap, using databases upfront is particularly beneficial and adds considerable velocity to the project. The key issue that differentiates the 2 paths is mainly between using a form of ModelViewController or using my favorite design pattern: PutTheDamnDataOnTheDamnScreen?.

A practical example to back up my assertion: I built a whole damn customized accounting system in a solo effort in just under 3 motnhs using DelphiLanguage and PostgreSql, and also Excel for very complicated reports. It is non-trivial and it is still running a tens of millions of dollars business after more than 3 years without change. The amount of code I had to write was rather trivial compared to the alternative using objects, persistence, mapping layers, MVC and the usual suspects. Also users are able to further customize the Excel report very effectively, and they learn easily the schema and how they can link important live accounting data to their spreadsheets. You just cannot have that kind of stuff without a relational database. Many other people have built successful large projects in VisualBasic, DelphiLanguage, NetDynamics, Oracle Designer/Forms/Reports/Procedure Builder etc, PowerBuilder, SAP, even PHP which is yet another hugely popular instance of PutTheDamnDataOnTheDamnScreen?. Your average PHP hacker will beat your average Servlet/JSP MVC (struts, EJB etc) hacker hands down when it comes to project velocity.

A relational database, when used properly, will trim down the amount of code significantly in your typical business application. If instead the relational database is used as a persistence store behind the ObjectModel of an MVC-like architecture, it will indeed inflate the code significantly, but then it is not the RDB which is to blame. Unfortunately for many OO people MVC has become a kind of sacred cow, and they cannot think of alternative ways to build applications. --CostinCozianu

Your description sounds like it is assuming heavy OO design. Maybe that is the difference in this HolyWar: I reject OO for business modeling. Use mostly procedural and you no longer have "factories and a mapping layer" to futz with. Use the database in all its naked glory. You don't need to wrap state in classes if the DB does it for you. Yes, you may end up depending a bit on a specific vendor or OSS code set, but if you roll-your-own incrimentally (YagNi style), then you are doing the same thing. Just because you create it instead of buy it does not make your code any less dependent on it. Use open-source DB's if having code is important. Besides, YagNi does not care about distant vendor-lock-in.

With a NimbleDatabase there is little or no additional cost development-wise. Databases are a tool, not a tax. The most RADish tools I ever used were based on NimbleDatabases of one kind or another. If later you need a more powerful RDBMS, then the code that depended on the NimbleDatabase can often be converted with only minor changes.

If you truly followed YagNi, then you would use byte-code instead of a 3rd-generation language (like Java, Pascal, Python, etc.) But most don't do that because they know that higher-level languages make life easier and do not emerge organically, at least not in a timely fashion. The reason for using databases is similar to the reason even Yagners don't start out with bytecode programming. Databases are to state/data what programming languages are to behavior. You don't use a low-level behavior language to start, so why use low-level state management?

The analogy to high-level languages fails: I truly NEED a high-level language in order to be productive. Relational databases from day one are impediments to the productivity of most developers. -SteveConover

Oh really? I agree that the tool choices offered by the bigname vendors tends to be sparse, but they are still better than the alternative of a roll-your-own NavigationalDatabase in my opinion. Built-in collections like arrays, bags, dictionaries, quickly run out of steam. I have been forced to use them in the past, and I really missed databases.

Aren't these the datatypes you end up using to represent groups of things, even with a database underneath?

No. I suppose the relational result set might be in such a structure, but it is a "flat" predictable structure not subject to too much "decoration". If it gets complicated, that is usually a sign that the DB is not being used right.

If you'd rather deal with jdbc resultsets directly then you probably are cornering yourself into early use of a database. Another common approach is to abstract away from the bits and pieces of data you get from a resultset and instead deal with that data in the context of classes relevant to the domain. I'm not making any value judgements about either approach, however with the latter one I think it's clear you could hold off on using a database from day one (and possibly indefinitely if a db doesn't turn out to be necessary). -SteveConover

I am not sure what you mean by "abstract away" here. Why are result sets not abstract enough? Result sets tend to be task-specific. Global abstractions fall over IMO. EverythingIsRelative.

I find it much more elegant to use an object-relational mapper to inflate real domain objects with the rowset data and hand me back a collection of useful objects. This is pretty common these days, though I wouldn't take a strong stand in favor of it in all cases - there are certainly times when dealing with data in rowsets is more effective. -SteveConover

What are the benefits of such? The drawbacks are possibly more classes, conversion effort, and perhaps more places to change if new columns are later added.

Is this a troll or are you honestly wondering about the benefits of object/relational mapping? Sorry for the question if it isn't.

[He doesn't see any value in object oriented programming, so he sees no reason to map relational data to objects. I think he wants the rest of the world to go back to the bad old days before we had OO.]

Perhaps this is for another topic, but I would indeed like to see what the "bad old days" were like. The "bad old days" was often caused by lame languages and lack of decent databases, not the lack of OO IMO. And, please stop calling people "trolls". It is an inflammatory, accusational, and useless term.

[He didn't call you a troll. He asked if your statement was a troll. I was there in the bad old days. C was no more lame than C++ (other than no OO support) and databases were widely available. No one held a gun to my head and forced me to use C++. I used it because it provided better language support for object oriented techniques (which had already proven their worth in C).]

Example? I would like to see code for a typical biz application. CeeIsNotThePinnacleOfProcedural. It lacks named parameters, nested routines, decent string handling, native dictionary arrays, etc. The word "troll" is rude, regardless of what you are referring to.

[You didn't say "non pinnacle", you said "lame". C++ doesn't have named parameters, nested routines, decent string handling or native dictionary arrays, either, but the OO support made it a major improvement for many C programmers. The word "troll" isn't rude. He isn't saying you live under a bridge. He was asking if you were just trying to argue for the fun of it.]

I would probably rather use C++ than C also, because I think C stinks. C++ can get around some of the weaknesses of C using OO. But, OO is not the ideal solution to the weaknesses, just better than C. For example, one can use OOP to sort of simulate named parameters and dictionary arrays. IOW:

   C  <  C++  <  Good_language

As far as "trying to argue for fun", I keep trying to figure out the OO mind. I am curious. BTW, perhaps this section belongs in BenefitsOfOo or something.

If you would rather deal in straight jdbc result sets then abstracting away from the persistence mechanism would have little benefit, so in your case you might as well use a database.

The reason I suspected a troll is that I figured everyone had gone through the pain of straight SQL/Resultset programming and had the relatively pleasant experience of using an O/R mapper instead. It's really unimaginable to me to code an algorithm today like "do X to [12 bits of data]" vs "do X to employee". Or "insert [12 things] into employee" instead of "db.create(emp)". You seem to have fundamental problems with OO, so it sounds like what this discussion this is becoming is addressed elsewhere on this wiki. -SteveConover

I would like to see some specific code to illustrate your viewpoint on this. One can make a createEmployee() function or stored procedure if they want to. I am not sure what you are getting at.

How is this topic different than WhenAreDatabasesNotAppropriate?

it might not be, feel free to refactor

I would rather first find out why the originator spawned it off because I have been accused of "baised editing" in the past. It appears they knew about the other topic, so they must have a reason it would follow.

I thought KentBeck's post to the list offered some clarity that justified a slightly stronger position against introducing databases on day 1. If you think this belongs on WhenAreDatabasesNotAppropriate, go ahead and refactor, I don't have strong feelings either way. -SteveConover

Is it tied to something on the mailing list? I could not see a related article listed on the link.

As far as "trying to argue for fun", I keep trying to figure out the OO mind. I am curious. BTW, perhaps this section belongs in BenefitsOfOo or something.

You don't sound curious. You sound like you know it all already.

I don't know what gives you that impression.

[Because of the bemused, condescending anthropologist-examining-strange-savage-ritual tone. This same tone is infecting the wiki lately and frankly every time an interesting OO thread gets started these days it seems to descend into a rehash of OO vs relational, why OO, etc. I am all for being open-minded but at this point I wish I could construct a magic filter that IP-blocked the RelationalWeenies, and we could get back to signal instead of noise.]

Well, I do not set out to purposely write "condescending" content, at least not consciously . If you have problems with content making negative implications, then see if it can be fixed or cleaned (without deleting useful info). I am sorry you feel that way, but without specifics, it is hard to make corrections. It is like in a marriage where one partner claims, "your actions indicate that you don't love me any more", and then the other partner will say something like, "Just because I forget to hang up the towels does not mean I don't love you anymore."

Some people view a database as higher-level operating system. Does the argument make sense anymore? Certainly there are classes of systems that don't need an OS (embedded) just as there are classes of systems that don't need a database. But for those that do.... UsingOperatingSystemUpFrontConsideredHarmful?

The issue here is that the ObjectOrientedCulturalAssumption suggests you *have* to start with objects. Databases are a weird thing that can be tacked on later.

No, the issue is most OO programmers have programmed with resultset's, and given the choice between resultsets and objects, they prefer objects. Object add structure to the code that you simply can't get from resultsets, and in our minds, make for far more stable programs, there's a massive difference between object.property and record["property"], and it's called compiler assistance/intellisense/autocomplete etc.

No matter what resultset you can come up with, I can write a code generator that looks at it and gives me strongly typed objects and strongly typed collections from it rather than just resultset and rows. As long as the generated code doesn't need manual tweaking (ActiveCodeGeneration not PassiveCodeGeneration), then you can sit down and start working with real object with little effort and no hand written code. I still write SQL, still use the power of the database to do joins and searching and calculated views... but once written.. it get's hit with a code generator to make that data easy to process once in memory. Most OO programmers don't like resultsets!

{Code generation is usually a smell in my book. It generally indicates a violation of once-and-only-once. You are going from a higher-abstraction form, the input to the code generator, to a lower abstraction, the result. As far as result sets being evil, I have yet to see a specific example.}

Some quotes:

: "You have the cost of design carry whether you anticipated correctly or not. To illustrate, suppose some day you will need a relational database. You correctly anticipate that, and put in the appropriate factories and a mapping layer to make sliding in an RDB easy. However, every time you change something, instead of having one object to change, you have a bunch. Instead of having one package, you have a bunch. Instead of having one class, you have a bunch. All of that stuff costs you *even if you are right*."

Well there's quite a few things being used "up front" in this quote. Apparently, an object oriented programming language is being used up front. Would these design carry costs exist if we started JUST with the database? Why couldn't we?

: "The analogy to high-level languages fails: I truly NEED a high-level language in order to be productive. Relational databases from day one are impediments to the productivity of most developer"

Absolutely not. You're assuming skill-sets here. SQL, T-SQL and PL/SQL are high-level languages. I know plenty of C++ and Java developers that are happy pigs in a pen when using regualr SQL, PL/SQL or T-SQL, because you can do so much so quickly in those languages.

: "If you'd rather deal with jdbc resultsets directly then you probably are cornering yourself into early use of a database."

If you'd rather deal with JDBC, you are probably cornering yourself into early use of Java. (ok, ok, that was devil's advocate, forgive me).

But there is an element in truth there. There are many examples of systems where you think "ok, I need a servlet engine, and Java, and a mapping layer ...". But if you started with the database, you discovered you could just write a 15 stored procedures and wrap it with mod_plsql. Or a thin servlet. Which would be faster?

I am not much of a fan of StoredProcedures. It often creates "two change points" for many common changes. Direct SQL is generally preferred.

Example: putting together an RSS feed for an existing database took VERY little effort in PL/SQL... http://asktom.oracle.com/~sdillon/rss.html

"Existing database" being the operative phrase. This section has very little to do with the original point and sounds like it belongs on one of the language war pages.

"The issue here is that the ObjectOrientedCulturalAssumption suggests you *have* to start with objects."

This is very close to the heart of the matter - although the use of "ObjectOrientedCulturalAssumption" by the author seems perjorative.

Object-minded developers don't *have* to "start with objects", they prefer to - it's how they prefer to think about software systems. It's a way of thinking. I think in objects first. Other people I know and respect think in tables first.

The funny thing is, both camps arguably start from the same starting point: a logical ER model or conceptual object model, which to me are essentially equivalent. To validate the argument I've been asking the RelationalWeenies I know for recommendations on database design literature, and what I've read so far advises the database designer to interview people and come up with a logical ER model - which is the same damned thing a DomainModeler does. So unless that's not how practicing relational-minded developers approach development, then I don't see any "psychological mismatch".

The problem surfaced by Kent's quote is one of implementation. Given a conceptual model, the relational-minded developer translates to tables and is done (except for logic, which I'll get to in a minute). Meanwhile the object-minded developer makes classes reifying the abstractions in the model. When instances of those classes must be persisted in an RDBMS, there becomes a heavy design cost (of mappers, etc.) to be carried. Since the object-minded developer also has to invent a set (let's say the same set) of tables for persisting instance state, all that object and mapper stuff must look like overhead to the relational developer. And rightly so. A lot of overhead is involved in getting a simple app on the air in J2EE with relational persistence and popular open-source frameworks.

But what about logic and complexity, especially over time? Enterprise applications typically have more complex logic than simply putting data on the screen. One of the reasons object developers prefer to think in objects is because they believe objects give them a way to manage that complexity. With objects they can decompose and allocate the logic required of a system over the objects in the system, which they believe yields a system that is easy to maintain and extend. And rightly so. I have seen those principles work and pay dividends in practice. To be honest, I don't know how to decompose and allocate responsibility using only PL/SQL in a way that is maintainable and extensible as the logic required of a system grows in amount and complexity. I'd really like to know what patterns RelationalWeenies use to avoid a BigBallOfMud in PL/SQL.

To me the story has a couple of morals. First, it's not like you can only choose one dogma. Both objects and relational databases are cool and have unique strengths; use each where it makes sense. Second, if using a database (which is inevitable for enterprise applications) causes carrying of too much design cost, perhaps the "database" (including mapper etc.) is not nimble enough. I've been spoiled by GemStone, which in my mind is the nimblest of the NimbleDatabases. Third, just because it comes from Jakarta and has an O'Reilly book doesn't mean it's the coolest thing since sliced bread and overhead-free. But that doesn't mean a pattern (like MVC) is worthless, it means you're not doing the simplest thing that could possibly work.

--RandyStafford

It is often alleged that OO "helps manage complexity" better than database-centric approaches, but it has not been demonstrated thus far. The biggest contributor to complexity is the interlinking of many factors. One has to go outside of simple encapsulation to make the proper factors be involved at the right time or place. This complex web of instantiations of myriads of helping classes is not a pretty sight. The more of these interactions one factors into data attributes, the easier it is to query them from different angles to get a better handle on relationships.

Which is exactly the problem OO solves. It allow behavior and rules to be attached to the data to prevent invalid queries. SQL allows you to query anything... this is bad, the data has no structure, you may forget something. Objects give the data structure and remove the burden of enforcing rules.. they do it for you.

{OO does not handle cross-entity relationships very well. You can't encapsulate the whole world. Most useful systems involve complex relationships between entities and things. OO has no standard for managing those.}

It is easier to query a web of attribute-based relationships than query a big wad of classes.

No it isn't.

{Maybe your Class Query Language is using a database of some kind and you don't even know it.}

What is left over after such factoring is local task-specific IF conditions. In other words, global abstractions are factored into data and local ones left as task-specific IF statements.

Global... bad... global... bad... global.. bad

{So are global classes bad? Note that nobody here is suggesting global "raw" variables as a solution.}

It is sometimes said that OO better integrates data and behavior, and that this is lacking in attribute-based complexity management, but in reality one often needs a level or two of indirection between data and behavior anyhow, regardless of paradigm.

Data and behavior belong together, not separated. If I have an int... I just want the int to work... I don't want to carry around a string and call parseInt(aString) every time I want to add two int's together. Most languages come with a few built in types, int/string/double/date/byte etc... but those aren't sufficient to solve most problems, you need to extend the type system with types that are native to the problem being solved, that's all OO is, an extension to the type system.

Well, not every OO proponent shares that view of what OO is. (NobodyAgreesOnWhatOoIs.) Further, what works on a small scale (integers) may not extrapolate to more complex "things". But, these arguments probably belong under ThereAreNoTypes.

I don't want an int, I want a SocialSecurityNumber?, or an OrderNumber?, because those can be self validating types, int's are too generic, with types it would be illegal to add two SSN's, but if they were just int's it wouldn't. Same goes for other domain entities, we like to build up the basic bricks/objects to have the necessary pieces to build a program with. If a shopping cart is something common to my domain, then I want a shopping cart data type, I want to be able to pass aShoppingCart as an argument to procedures as a whole entity, not just a loose collection of task specific data. You can either program "in" a language as procedural programmers do... or you can program "on" a language as OO programmers do. It's a different mindset, but it's one which OO programmers feel is far superior to programming "in" a language. If OO isn't for you, then don't use it, but don't presume to knock it either, it is a valid approach whether you like it or not.

{Regarding shopping-cart example, see GenericBusinessFrameworkUnobtainable.}

Thus, the less-than-tight relationship between them found in data-centric approaches is often a gift rather than a flaw. I just find that the more complexity I can manage and monitor as data instead of code, the easier things are for me.