Refactoring Is Not Relational

Moved from RefactoringImprovingTheDesignOfExistingCode (a book).

Apologies for the misleading title -- I would welcome input from any WikiMaster who wishes to propose a better. --StevenNewton

(Some dispute that we need to open a new topic for this. I has been interpreted by some as as a possible move to "hide" criticism of the book. As one RelationalWeenie, I personally am not really bothered by the move, but I rewrote a message from another that may have been.)

As a RelationalWeenie (RW), I browsed through some of the examples in this book in the bookstore and was a bit bothered by some. The author seems to fall into the classic RW complaint of ReinventingTheDatabaseInApplication. One example physically split accounting transactions into separate "finished" and "unfinished" collections, moving records/objects from one side to the other as processed. IMO, it is better to set a status code attribute/column of some sort so that you can view all transactions by other criteria besides finished status without having to "glue" or link both collections back together. It is IS-A partitioning where it should be HAS-A. (Sorry, I don't have a page number.)

{Note that somebody may have changed this text from the original. It does not read like I remember it from a month or so back. Maybe my memory is faulty, but it just seems different.}

Which particular examples raised concern? Could you provide chapter or page numbers?

A discussion perhaps about the same topic came up in comp.object between TopMind and MartinFowler. Here is a link:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=MPG.178e8f46d88c32db98a642%40news.earthlink.net&rnum=1&prev=/groups%3Fq%3Dgroup:comp.object%2Binsubject:Hard%2Binsubject:Partitioning

In case the link gets messed up, the title is "Hard Partitioning (was: recommend a good object oriented book?)" and the first message is time-stamped 2002-07-04 14:26:37 PST.

Here is a snippet which to me suggests that MartinFowler would indeed use two different "groups" for such a situation:

  > 
  > The test cases will then "suggest" that you pick one
  > partitioning dimension over others? (Or use
  > complex patterns)
  > 

  Yes, I'll use inheritance for the one that seems the most dominant, and 
  use state (or similar) for the others. (Hence the phrase, "inheritance 
  is card you can play only once".) I don't think there's a way to tell 
  other than from the actual requirements you are working with.

  Martin

The RW who started this topic needs to state which refactoring has the example that causes concern. A moderately close review of the book did not turn up any code that dealt with "finished" and "unfinished" accounting transactions in collections. There may be value in discussing the example in relation and oo implementations. Because there is no such example to be found in Fowler's book the association with RefactoringImprovingTheDesignOfExistingCode should be removed. If no such example is forthcoming, this page needs the TrollFlag.

No need for name-calling. It may be a simple memory flop. If the page number cannot be found, then simply de-link the example from the book title.

[I don't know which example RelationalWeenie is talking about, but there are forces that favor IS-A over HAS-A. Most important is probably polymorphism. Viewing transactions may not be important (or even required) but asking them to do things that differ by finished/unfinished state may.]

Perhaps if it could be done without replicating the data set just for the purpose of applying a polymorphic operation. However, this gets into HolyWar areas regarding shared data and ObjectRelationalPsychologicalMismatch. Hence, other RelationalWeenies may be bothered by this book. (See also PolymorphismLimits.)

[Oh, I get it. You think he was replicating the data. He was probably just moving a reference from one collection to another. Java collections don't contain the objects themselves, they contain references to objects.]

IOW, a kind-of code-centric index. I would rather let the database deal with those.

Yes, this book is actually not at all about relational-databases. People with extremely relational-centric points-of-view will have to work a little harder to get anything out of a book like this. Just like you might have to learn a little French to get the most out of a vacation in Paris. Life's hard that way. -- francis

This implies to me that you think RelationalWeenies are ill-informed about OOP. How about we chalk it up to a typical HolyWar about the ObjectRelationalPsychologicalMismatch and AgreeToDisagree. I don't see this leading anywhere new.

: No, it means that I think one particular RelationalWeenie might not have a very good grounding in OO. Most modern OO languages, for example, routinely copy references instead of the actual entities. Faster that way, and it's not that hard to conceptualize. (Like most items of convenience, of course, there are always a few wrinkles.) If you see Java code copying references and then think that refers to replicating data, it doesn't make you sound like you have a lot of experience with Java. -- francis -- I don't know how the author of the book actually did it (at this point), but whether you replicate data or roll-your-own indexes (pointer lists), it is still ReinventingTheDatabaseInApplication. The more you do with those "pointers" the more you have a code-centric roll-your-own NavigationalDatabase.
: Nobody rolls their own pointer lists in a decent OO language. They're implemented implicitly, and you usually don't have to be conscious of them. Again, something that should be apparent to most people who grok OO. -- francis
: The more they are "implemented implicitly" the more you are relying on a NavigationalDatabase-like contraption of some sort, whether you call it that or not.

If that was true, we'd have tons of trivial responses to ManyToManyChallenge, but apparently it ain't so. People are (re)implementing their custom made pointer lists in all OO languages in existence so far. What is funny, is when they reimplement over and over again in the same project. They are reimplementing pointers lists in the same book and/or across OO books, and such is apparently the case with MartinFowler. By the way, Francis, a Ruby response to that challenge should be fun to write, and not exceedingly difficult.

[Or someone could translate the book from OOWeenie to RelationalWeenie. Replace the collections with tables and references with keys, etc.]

They are too different IMO. It would change the flavor of the book.

[I was kidding.]

IOW, a kind-of code-centric index. I would rather let the database deal with those.

I find it very hard to believe that you really want to send multiple SQL statements to another process and verify their results instead of moving a reference from one list to another. I think you're pulling our collective leg. -- EricHodges

What do you mean by "and verify"? It can be as simple as:

  update stuff set status='finished' where id=$id

By "and verify their results" I mean verify the results of the operations. In Java (the language used in the examples in "Refactoring") this means checking the value returned from Statement.executeUpdate().

Further, other languages and apps can "see" the change. We don't have to dump-and-reload when we later need to share. I would have to see more details to know if we are "wasting processes". Perhaps we can do it as a group instead of one-at-a-time. This may save time if we are using any kind of "shared state" mechanism.

  update stuff set status='finished' where [something that applies to a group]

"when we later need to share"? That's a hefty assumption for some code used as an example of refactoring.

update stuff set status='finished' where [something that applies to a group]

The same thing could be written in C++ as

stuff.finish("something that applies to a group");

Yes, but Oracle or MySQL implemented it for me. Reuse. Are you going to make a new function/method for every query used? What if the [something that applies to group] is a complex expression? Are you going to build a parser in C? If it is native to the language, then how do other applications readily share and query the info? SharingDataIsImportant.

If the previous SQL was using something based on an index, the corresponding C++ would become

stuffIndexCollection.finish();

Add triggers or exceptions to either one and you need to add a lot of error handling code around the statements provided above. So what's the big difference?

No need for triggers or exceptions has been identified yet.

[What does any of this have to do with the mysterious example from MartinFowler's book on Refactoring?]

Martin should talk to the database to change the status, not move objects or object pointers from list A to list B to change the status.

[What database? The book is about refactoring OO code. The examples are in Java. There is no database to talk to. Inside the implementation of databases, references are being moved. This book teaches people how to improve the quality of that sort of code.]

It appeared to be a typical business example, not systems software design. People who read books that show ReinventingTheDatabaseInApplication examples will end up doing that same at work, which is generally a bad practice IMO.

[Moving a reference from one list to another is not ReinventingTheDatabaseInApplication.]

Well, it is bad design if later some other app/tool wants to read/know the status. (I suspect this will turn into a YagniAndDatabases debate.)

Refactoring is not Relational

So what? I use both OO and Relational (and functional and procedural) techniques when I write code, and I switch between them as the problem dictates, and my tools allow. Seeing any one of them as "the one true way" is somewhat like seeing the FingerPointingAtTheMoon. One of my favorite things about XP is that we stop thinking about the problem in terms of technical purity and focus on measurable solutions. If duplicating references (and reimplementing indexes) is bad, it will be obvious in less than a week, especially if one agressively refactors. That the book doesn't show things being extracted out to RDBMS's or any other external data storage certainly doesn't preclude me from doing so. Again understanding the princples being taught there instead of judging them against one's preferred method allows one to see the moon. -- BillCaputo

Wouldn't that work both ways? One could start it in a database-centric way and then change it if that does not work out. The question then becomes whether the starting point is data-centric or OO-centric.

I would like to see a specific example. IMO TablesAndObjectsAreTooDifferent.

How are they different? To me an entity relationship diagram looks a lot like a class diagram. Objects merely add hierarchy and functionality to what is provided by relational tables.

To begin with E/R is different (albeit not a whole lot) than relational model. Incidentally, E/R also supports type hierarchies and even better than most current OO languages. Going back to relational, objects merely "add functionality", while seriously subtracting functionality in the same time. See ManyToManyChallenge, EjbTernaryRelationshipExample and this is just the tip of the iceberg. Objects do essentially miss: compositionality, declarative integrity enforcement (including but not limitted to relationship management).

I disagree, but perhaps this belongs under TablesAndObjectsAreTooDifferent.