When One Leaf Dies Buya Whole Tree

This is a problem I've seen at least twice myself.

In an application with a complicated instance structure, and one element anywhere in the structure changes, it becomes necessary to completely re-process the whole stinking thing.

This happens in all sorts of systems with serial storage mechanisms (stream persistence, serial data files). If one object in the hierarchy changes, then the entire hierarchy has to be written out.

Recently, I was working on a financial package for managing research grant money managed by my employer and I noticed that the grant object which contained lists of salary objects, equipment objects, travel objects and such uses this approach. If a user changes a single value on any line item, every line item of every group is recalculated so that the grant total can be recalculated.

In general, I would certainly agree with RonJeffries. DoTheSimplestThingThatCouldPossiblyWork usually means accepting the fact that processing the entire instance hierarchy is the best answer (no matter how it might gall you) and we just have to live with it.

Case 1: In the case of the financial package that I'm currently working on - I just have to live with it. I must also second Ron's comment about the dangers of keeping total calculations and such and only adjusting them after every change. I really liked his tree structure suggestion. It's not always feasible, but if you have a large structure that has significant amounts of calculating going on it could save you lots of CPU time.

Case 2: A previous project I worked on had more strenuous properties.

  1. Data size could easily go into megabytes, even tens of megabytes.
  2. Some changes are made as often as dozens of times a second.
  3. After each change in implemented, the updated objects must be written out.

Clearly in this situation, a more powerful PersistenceMechanism was required and I eventually implemented one that used both serial and random access data files plus streaming and object request broker network persistence. The random access data file and object request broker used event listeners to detect changes to objects so that the changed objects (and only those objects) would be updated.

-- DonaldMcLean


Generally speaking if you DoTheSimplestThingThatCouldPossiblyWork, you recalc the whole tree. If this isn't too costly, stop there. In even the easy cases, the alternative of adjusting the tree is often quite tricky: in complex cases it's often out of the question. Like any optimization error, errors in such code can produce what seem to be intermittent bugs. Try not to go there.

Yes. MakeItWorkMakeItRightMakeItFast

One decent alternative is often to build a more complex (less serial) structure, so that only one branch of the tree has to be rebuilt. For example, in the expression:

 A + B + C + D + E + F + G + H

if D changes, you have to calculate everything. However, in the grouped expression:

 (A + B) + (C + D) + (E + F) + (G + H)

if D changes, you must recalculate C+D and the top-level sum, but you need not recalc the A+B, E+F, G+H subtrees. In general the cost is reduced to something like log N instead of N. It's usually pretty easy to come up with a canonical tree structure that lets you trim what needs recomputing. In my experience it's usually pretty hard to do adjustments without support from the data structure.

But again, be sure the cost is really troubling you before you go working on this. The computer time you save may not be worth the programmer time you burn. -- RonJeffries

But if the saved computer time allows you to CloseTheLoop (get more timely feedback on your changes), there may be other gains? Still, rumour has it that TestFirstProgramming gives these gains and others besides.

Do you mean feedback between the code and the programmer, or feedback between the user and the programmer? [Either or both, depending on context] My experience is that optimization problems in general don't often rear their ugly heads in test code anyway, so my slow unoptimized code isn't usually much slower to test/code/refactor etc.

Then again, for code/programmer interactions, the ProgrammerTest ought to be adequate (he said, bitterly). For user/programmer problems with an OffsiteCustomer?, the loop is wide open anyway. So you're right, making a new tree doesn't take too long.


There is something wrong with the premise of this page, or is there something wrong with using a tree structure to begin with? It seems to me if the leaf dies, you prune the branch, or graft in a new branch, not buy a whole tree! That would seem to fit "DoTheSimplestThingThatCouldPossiblyWork!" Replacing the tree is not simple, easy, efficient or a practical thing to do. It would qualify as being Extreme.


Bruce Wilcox, the author of the first relatively strong computer program to play the GameOfGo, wrote in "Reflections on Building Two Go Programs" (SIGART Newsletter, October 1985, No. 94) discussing his experience with incremental modification of data structures, and concludes that it's worth going to a lot of trouble to avoid it if possible, even in an application (updating knowledge after a single move makes a small change on a large board) where it looks like a natural way to approach the problem.

Wilcox writes "An excellent paper [B. Lampson, 'Hints for Computer System Design'], IEEE Software Vol. 1 Num. 1 January 1984 [online at http://research.microsoft.com/~lampson/33-Hints/WebPage.html], outlining principles to follow advised against doing things incrementally. Unfortunately the paper was published after I spent over five years learning that lesson. Since the lesson is only mentioned briefly in that paper, I've decided to emphasize the lesson with a more complete case history." He then goes on for about a page and a half describing how incremental update increased the complexity of various aspects of his first program, and comparing it to the "buy the whole tree" approach he used in the second program.

Hmm, that sounds like one to file under "DontLearnTheHardWay?".


A simple compromise is to keep a cache of recent changes, which is always queried, but applied back to the main data structure only at rare intervals.

Or even never, depending on the problem, as in the case of SCCS, which keeps the original file forever plus an increasing list of diffs against it, or similarly RCS, which keeps the most recent file as a base and applies diffs going backwards in time. Which is better depends on whether the oldest or newest version is most frequently requested.

The correct way to do this is well known in e.g. virtual memory and in database implementations.


LazyEvaluation, when available, is handy here. But a good rule of thumb is do the "brute-force" approach first, then optimize as needed. If you get the buy-a-whole-tree working, then you have something to compare your buy-just-a-leaf implementation to (they ought to have identical behavior, except for performance).


CategoryGardeningMetaphor


EditText of this page (last edited April 19, 2005) or FindPage with title or text search