Task Scheduling Using Zipfs Law

A task scheduling technique which applies ZipfsLaw (PowerLaw) to obtain estimates. The technique requires two orderings of tasks - one in order of importance, the other in order of difficulty.

You and your customer compile a set of tasks which your customer wants you to complete. Have your customer arrange the tasks in order of importance. Assign each task an importance order I, 1 = most important, 2 = second most important etc. You arrange the tasks in order of difficulty D, 1 = most difficult etc.

Compute a priority P = I / D for each task. Do the tasks with the first priority (low P) first.

The formula - importance over difficulty - leaves little doubt that the scheduling technique identifies the best jobs to do first - the best business-value-for-money.

The approach applies ZipfsLaw, a statistical law in which the 'order of size' usually varies inversely with the actual size of things. Applied here, the order of difficulty of tasks is in inverse proportion their actual difficulty.

So instead of fully time-estimating each task (or even partially using IdealProgrammingTime or GummiBearsOfComplexity), you just order the tasks and give them their relative order of difficulty ( = time = cost). It is usually easy to say which of two tasks will be the most difficult. If it isn't easy to tell, it probably isn't really important - use gut feel.

Your customer takes a similar approach to importance ( = business value).

Ask your customer to inform you when, by their own reckoning, they must go and invest in some other aspect of their business.

Occasionally (once an iteration?) revise the lists - remove the completed tasks and add any new tasks you've thought of. But note that, over time, many of the most important tasks will have been completed, so the priorities for the remaining most important tasks will be artificially high (low P). It might become appropriate to apply a more complex formula which takes this into account:

               E
        (I + C)
    P = --------
           D
The difficulty of which is supplying appropriate values C and E. C should be greater than 0, e.g. 5 or 10. It could be the number of tasks completed, but this becomes problematic when it dwarfs the number of currently scheduled tasks. E might be slightly greater than 1, e.g. 1.5 or 2.


Why is an equation using arbitrary rules and numbers superior to any other approach? Why not just ask the question of the users, "If you could have just one new capability, what would it be?" Task scheduling is subjective and based on the needs of the users. Using an "objective" algorithm will only serve to distort the truth.

It's valuable when you have an equation that over time has empirically matched the list that you arrive at when you iteratively apply your question over all the potential features. The real value is that you can get the relative difficulty and the relative importance, then apply the equation much more quickly than you can iteratively apply the "what do you want next" rule. If you have the list of features, and you ask the customer, "which one do you want next?" the customer has to mentally juggle the importance and cost to tactically pick one from the list. Naturally, the trick is to figure out what equation to use, and I've found it's different by program team and customer. The one I've seen used most often is a power law like the above equation, but in only four variables.

The problem is that unless the values going in express a true numeric relationship (i.e. 1 is half of 2, one-third of 3, and one-fourth of 4) the equations a meaningless. One can iterate as much as possible, but one cannot apply arithmetic operations to non-arithmetic units. Additionally, iteration implies one has a master measurement method that the equation is being compared against. Before resorting to the equation as an alternate measurement, one needs to evaluate the costs and differences between the two measurement methods.

''When I first skimmed this, I thought the same way -- these difficulty values of "1", "2", "3" are totally arbitrary. But that misses the whole point of bringing up Zipfs Law http://en.wikipedia.org/wiki/Zipf%27s_law . Would you agree that there are one or 2 features that are "must have", while in the "possibly nifty to include, but not really important" there must be hundreds of features ? What the P = I/D equation is doing is *assuming* that the "value" of 2nd most important thing is only 1/2 of the most important thing; the "value" of the 3rd most important thing is only 1/3 of the most important thing, etc. It also assumes that the "2nd most difficult thing" would take half as long to implement as the most difficult thing; the "3rd most difficult thing" would take 1/3 as long to implement as the most difficult thing, etc. In other words, instead of calculating *exactly* how valuable a feature is, and *exactly* how long it would take to implement each feature (a waste of time, quicker to just go ahead and *do* it), we're getting a quick estimate by assuming Zipf's law is a good enough approximation for this particular list of features.''

Let's say you're right about the "value" dropping off for the less important features. I still am not buying your idea that "time to implement" drops off with less difficult features. Given a list of features, I usually start thinking "OK, if I do *that* feature first, I could re-use most of its code and knock off this other feature in a few minutes". So I put that ("easy") other feature at the bottom of my difficulty list, and if the customer ranks both features about the same, this formula tells me to do the "other" feature right away, delaying or perhaps never implementing the "difficult" feature. (Say I have "add a spelling checker" as the difficult feature, and "Let the user add words to the spelling checker" as the easy feature.). -- DavidCary, arguing with himself again.

Please note that this is discussing something different, i.e., task ordering within a single priority level. This is not discussing task ordering across different priority levels. Even within a given priority level, it is a pretty subjective judgment as to whether it is better to get the easy ones done first and defer the more difficult, or to get started with the most difficult. There are a lot of other factors to consider, such as team members familiarity with the problem, tool set, customer, etc.


An alternate approach that has proven successful in some projects is to have the customer estimate the value of each story to the business, in dollars. Then divide each story's value by the number of XP story points estimated to complete it. Now you have "bang for the buck," in terms of dollars of value per point of effort. (This makes the measure really be "bucks for the effort," rather than "bang for the buck.") Do the stories with the largest "bang for the buck" value.

Redirect resources to other projects when their "bang for the buck" value is substantially more attractive.


re-phrased some confusing bits. -- DavidCary


CategoryMethodology CategoryProject CategoryMetrics CategoryModellingLawsAndPrinciples


EditText of this page (last edited December 16, 2003) or FindPage with title or text search