It seems that transforming data structures without taking a server out of production for a while is impossible. But maybe there are some good tricks for minimizing the intrusion? BillSeitz
I'm not sure I understand the question. Are you talking about migrating existing data? You have to do that anyway with every new release that changes the database schema. If you're refactoring within a development environment, I don't see the problem. There's only test data there and only one pair is using it. Am I missing something? --MartijnMeijering (answer moved here from CodeNormalization)
Maybe I'm misinterpreting what I've read around here, but it seems like "hey with those tests in place you should be able to RefactorMercilessly (often, with little disruption to production)". Data migration is nasty. Which might reduce my willingness to refactor if it's not absolutely necessary. Does that help? --BillSeitz
I have hardly any experience with it, but is migrating data really that nasty? -MartijnMeijering
Migration can be straightforward, it can be unbelievably nasty and high risk. I'd say that frequent complicated (fear inducing) data migration will slow refactoring and indicates a brittle environment: a poor fit with the XpFrame. --KeithBraithwaite
Could you say some more about the forces that determine whether it is straightforward or risky? --MartijnMeijering
Well, this depends a lot on your database. Postgres, to pick one I'm familiar with, permits you to add columns to an existing database--but you don't get any RelationalIntegrity constraints, and you can't remove columns that already exist. Any kind of complicated refactoring tends to involve copying the data from the table into an auxiliary one, dropping the table, creating it anew, and putting the data back in, and even if you're very careful you still have to deal with programs that expect the old layout and will break. Near as I can tell the only way to do it if you cannot afford any downtime is to
There are the annoying not-tool-supported technical issues of making database changes: Unless you're adding NULLable (or defaulted) columns, you'll have to do rename-copy-drop on the table.
In many organizations, relational database technology is considered to be a strategic corporate resource, requiring centralized control by specially trained database "experts." And, to a large extent, the design, tuning, and operation of relational database schemas requires skills and experience beyond that of your average developer.
So, relational databases are often under the strict control of a centralized group; nothing (other than data) can change in a database without having them doing it for you. Some companies take this to extremes, requiring centralized control of development/unit testing databases with the same rigor of production databases. And a few insist that no database change can be made without having a face-to-face meeting between members of the development team and members of the centralized DBA team (so they can "do" the database design for you). Needless to say, these policies increase the cost of change and are anti-XP.
Also, in many business system development projects, the developers must share a common database. So, if one pair were to change something in the database schema, it would immediately break code and stop the work of all other pairs attempting to work at the same time. (This violates the commonly understood requirement that each developer (or pair) must have their own isolated "play pen" in which to do development and unit testing. But this argument doesn't seem to hold much weight in many corporate development environments.)
Thus, in many development environments, changes to the database schema are made very costly.
I have some experience of this. All systems have release cycles of some sort. As you say, in a corporate environment this is a heavyweight operation, and rightly so since there is significant risk. So what you need is two development systems. One is the QA environment where release procedures can be rehearsed and a full regression suite can be run. Then you've got the development box which can be unstable as you like. In this environment nothing is sacred under loose consensus. We used group e-mailing to warn of impending downtime and all operations were guaranteed to complete fast or be rolled back. I was the moderator of all database changes which for me meant splitting the development box into about five clones which could be booked out for high speed development. We got a full database build time, from disk upwards, down to about an hour allowing interactive integration debugging with repeated combing. All SQL was written as stored procedures (very important!). I suppose the point is to get yourself an environment you have complete control over, and work it harder than it ever has before :).
Forget vague terms like corporate environment or centralized control. The issue is simply that any database object that is accessible to client code is a public member of a public API. When you consider that for most applications, the number of columns, tables, and stored procedures are in the thousands...
Now think of this database as a generic class library and consider how many API members you make public to any client application and how often you change those.
Yes there are other difficulties to refactoring databases vs. typical oop libraries (and some easier tasks with database refactorings), but the overwhelming issue is the size of the public API. Dramatically reduce your public database elements (and it's main benefit of massive reusability) and refactoring is much easier.