Dont Refactor Databases

Try to get them right the first time, it's not that difficult. The refactorings should be minor and very rare. It is a mistake to think that since refactoring is good for code the same must be true of databases.

Be prepared to refactor by addition, and maintain compatible logical schemas. Do that only when the realities of the domain being modeled have changed, not just because your classes suffered a refactoring and consequently the database also needs that.

This approach is supported by DatabaseIsRepresenterOfFacts. The structure of the business or the domain being modelled doesn't change very often nor it changes drastically, so the structure of the facts describing the business follows the same pattern. It is the failure of programmers to correctly do analysis and requirements engineering right the first time, that leads to solutions like RefactoringWithRelationalDatabases. More, it is the misinterpretation of OO principles and the bad advice in some OO books that lead to the problem of frequent database refactoring.

Because, if you look at the database as just "the place where I store my objects", and you refactor your class structure often, then you need to refactor the database schema often. While a better understanding of the principles of object oriented design, will lead to solutions where the structural aspects of the domain objects will not change as drastically and can be correctly identified upfront, rather than later through successive refactorings.

What really ought to be subject to refactoring and evolutionary improvements in OO design is the behavioural aspect, and that should not have influence on the database schema.

-- CostinCozianu

Is it possible that the main reason this holds true is that the interface we have to databases is not one that easily supports refactoring? I don't see any (practical) reason why a database that had an interface which made refactoring easy should not be refactored. The arguments for refactoring code apply just as easily to a database and although I absolutely agree with what you say in terms of a database being a representation of facts rather than a repository for objects I think there may be times when you want to change the way you represent those facts.

While refactoring code can be done with impunity (just replace a ".class", ".jar", ".dll", ".exe" with another), refactoring a database is necessarily painful, no-matter what facilities you want to have in a database, because databases contain (lots of) data. We're not talking here about the cases where only a development database exists and you can just drop and recreate a new one. Most of the database development has also production databases to worry about, and it is only natural that their DBAs are reluctant to change the structure.

Data structures (relational database schema, XML schemas etc), and code structures (classes, interfaces, functions, modules) are very different animals, I don't see how arguments for one thing automatically apply to another. For example code design problems are more complex than data modeling problems, and hence the difficulty to get it perfect the first time, and hence the justification for quick prototyping and successive refactoring, while with data modeling it's just a lot easier to get it right the first time and is a lot harder to refactor later.


A production database holds facts about a business. A programmer with refactoring in mind is not allowed to change those facts. This identifies a critical problem with database systems: We need to keep the existing facts and we need to safely add new facts that are not necessarily an exact structural match to the existing facts. Perhaps database structures need to be a little more flexible and extensible with out drastic refactoring and data migration.

-- DaveEaton


Every database schema I've ever worked with needed changes on a regular basis. And every database schema I've ever worked with needed more changes than that - changes we were reluctant to make due to the cost of migrating the data, changing the programs, and retesting the whole thing.

Yes, database books often talk about the "core business rules" that will never change. But this "holy grail" of software development turns out not to be true in many cases. The complete "business model" that managers use to run their business is comprised of a number of mental models - which may change from month to month, based on whatever trade magazine article the PointyHairedBoss read most recently.

This month, the business is focused on providing the best service possible to the large volume customers, because they bring in the most revenue. Next month, we switch to shipping the maximum product possible each day, to get the best quarterly numbers (often at the expense of large high-profit orders). The month after that, we sign on to some corporate initiative to standardize on a limited set of standardized configurations (in spite of "custom build-to-order" advantage we were pushing just last month).

Things change. Sometimes they change radically.

-- JeffGrigg


I think Costin's point is that the database schema should not be changed to reflect changes to the application's implementation. That is, if you refactor some classes whose data is held in the database, you shouldn't change the schema to match the new class structure.

I'll agree with that wholeheartedly. Trying to maintain a one-to-one correspondence between classes and database tables is very costly, and doesn't really gain you much. It's better to have a layer that takes care of object persistence, allowing classes and schemas to vary independently. Not only does this help with the class refactoring issue - it also helps with the incompatibilities between different relational database products. (Database vendors can't agree on many basic things, including the representation of dates, what sorts of triggers can be defined, what types of views are updatable, how to generate a unique key value for every row in a table, ...)

But the idea that you can get your data model right on the first try is not reasonable. Every system I've worked on has had a couple of major database re-designs, and that's not because mistakes were made in the beginning. The nature of the data being stored and the relationships between tables tend to change as a complex system evolves.

I've also tried the "maintain compatible logical schemas" approach, but that usually ends up being a lot of work that doesn't pay off. It's better to just bite the bullet and change all the code to match the new logical schema.

Changing database schemas is costly, so I'd agree with the rule "Don't change database schemas unless absolutely necessary." That is, don't change them just because they "smell" - change them only if they no longer support the necessary data model.

-- KrisJohnson


Wouldn't this be an appropriate place for BridgePattern and/or MementoPattern to decouple the in-memory object model from the persistent object model?

-- JonathanChoy


In my experiences, the schema always needed refactoring just like code. Maybe changed at a slower pace. I believe in schema evolution along with other software. So, I do not do BigUpfrontDesign?, but let the software dictate what we need out of our databases. Some of my thoughts and experiences can be found @ http://shodaka.net/pwiki/Wiki.jsp?page=TestingWithDatabases. -- RaghuHavaldar?

"IDEA" tool for schema evolution http://www.informatik.uni-bonn.de/~idea/SEA/sea.html


Primary Reasons to Have to Refactor Schemas (bad designs in most cases):


The structure of the business or the domain being modelled doesn't change very often nor it changes drastically, so the structure of the facts describing the business follows the same pattern.

Except when it does --- I cannot count, on both hands, the number of times I have been involved with teams that perform very diligent and cautious requirements analyses, only to have it all blown out of the water later. This is because the directors take on entirely new, well, directions as business needs evolve. It happens most with startups or small businesses, since they tend to be more exploratory, and thus don't have the full picture from the outset.

I agree with DatabaseIsRepresenterOfFacts, but this page seems to preach that the facts never change. In a business world that is intensely competitive, this simply doesn't convince.

--AnonymousDonor

Perhaps we need to look at examples. Two examples come to mind right now. The first is a change in the "quantity of relationships", such as changing from one-to-one into one-to-many, etc. For example, you start out with one address per customer such that you put address info into the Customer table. However, you find you later need a billing address, sales address, etc. for each customer. Thus, you need to split out the address info into a Contact table or the like (see ContactAndAddressModels).

A second example is the conversion of column-wise info into row-wise info. For example, it is easier to add new categories if you row-atize them:

  table: salesByStore 
  ------
  storeRef  // store ID as foreign key
  tires  // amount of sales
  engine_parts // amount of sales
  body_parts // etc.
  repairs
To make categories more flexible, this is often a useful refactoring:

 table: salesByStore
 ------
 storeRef
 categRef  // foreign key to sales_categs table
 amount

table: sales_categs ------ categID categTitle // "tires", "repairs", etc...

Now new categories can be added without changing the schema. YagNi would often dictate the first approach early in the project. The first approach does not violate any normalization rules. It is just less flexible if new categories keep popping up. (Another solution may be a DynamicRelational database).

-top


See also: DatabaseBestPractices


CategoryDatabase, CategoryRefactoring


EditText of this page (last edited March 8, 2007) or FindPage with title or text search