Whatever Gets Measured Gets Optimized

WhateverGetsMeasuredGetsOptimized.

Even if you care about the number of bugs in your system, creating a BigVisibleChart of the number of bugs may not be the right thing to do. People may very well concentrate on fixing all the little niggling bugs, and leave one great, glaring hairy monster. People will optimize what you measure, so be sure to measure the right thing.

In the example above, maybe what you really want to measure is the age of the oldest bug.

Nope. Then they would ignore a recent critical bugs and work on an ancient neglible bugs.

If you measure LinesOfCode for productivity, at least a few people will start writing more lines of code.

Thus it would be better to measure something that attempts to factor out bad responses to the metric, such as CyclomaticComplexityMetric

Clearly this is about measuring things to guide people's behaviour. The lesson is simple: do not use a measure just because it's easy and correlates with what you want, but instead try to measure as directly as possible or at least try to estimate the bias of your method.

This is hard, and there are no easy answers.


The act of measuring itself leads to increased productivity due to the HawthorneEffect.

Yes, but that's a completely different effect from "Gaming" the measurements. It's also temporary, and while I haven't seen them, I've heard that more recent studies have suggested that it's not as clear-cut as is often made out by those who blindly say "HawthorneEffect" at every mention of measurements.

What is clear is that measurement is complex, and causes changes in human behaviour. What is being highlighted here is that ill-considered, ill-designed measurements may have a different affect from that intended.

And MeasurementIsNotEmpathy?. Too many places where I've worked have jumped on the MeasurementBandwagon?, quickly identifying everything that can be measured, and then rationalizing metrics and outcomes based upon those measurements. Unfortunately this leaves management completely out of touch with the SociologyOfWork, and creates an UnfriendlyWorkEnvironment?.


The bug to to fix is the one with the biggest teeth biting your butt.

I think the WhateverGetsMeasuredGetsOptimized doesn't actually refer to bugs alone. Really it is stating the obvious - if something is to be optimised then the improvement has to be seen, so before and after need to be measured.

Let's not confuse WhateverGetsMeasuredGetsOptimized with ProfileBeforeOptimizing. Certainly, if something is to be optimised then it needs to be measured. However, what this page is about is measuring the right thing. If your measure of a Good Cheese Shop is how clean it is, then the cleanest/optimal Cheese Shop is one that is uncontaminated by cheese. So cleanliness is not the best measure of a good Cheese Shop.


It is a well-known effect that when something is being measured:

Example: You measure programmer productivity by LinesOfCode. How do you think a programmer will set all 1000 elements of an array to 0? Yes, they'll write a script to generate 1000 statements, and insert those. Voila! 30 seconds of work, 1000 lines of code, and one very productive programmer, as measured by the metric.

Hang on, why not declare the array to have 10000 elements? No harm in having a little extra space ...

Further, a programmer who notices that two routines have a lot in common and abstracts them into a single routine with parameters will have negative productivity, even thought the resulting code might be cleaner, clearer, easier to modify in the future, and easier to maintain.

BadProgrammer - no pizza!


A more insidious form: many source control tools now report the net source code line count on each commit. It is usually the case that the fewer lines required to implement a feature, the easier the patch will be to review, and the better the code will be to maintain in the long run, so the developer aims to keep the net difference small or negative, especially during refactoring. Unfortunately, most tools are language agnostic, and don't distinguish comments from code, resulting in pressure to not add commentary. It also may promote a terser coding style, cramming more functionality on each line, also leading to reduced readability. (Don't forget, the primary audience of code is the maintenance programmer who comes after you, not the compiler.) -- IanOsgood

Also, productivity measured in LinesOfCode?! Does anyone actually do this? Only features are the assets, code is the liability! (SoftwareAsLiability)-- IanOsgood

"This is still used as a developer performance metric in many places"


The problem is not in fine-tuning your measurement - it's more about the mental model you have which dictates how you intervene on the back of the measurement.

Let's say a metric is applied to a system made up of a collection of teams - bug reports per team.

I am sure we can all think of lots of cross-team reasons why teams might vary wildly in the number of bugs generated, other than purely assinging each team's number as being down to the standalone efforts of that team. For instance maybe team B is dealing with horrendously messy feeds from team A - so quality problems in team A are manifesting as bug reports in team B.

And actually system theory type management thinkers, eg WilliamEdwardsDeming etc would say that's the norm - that most of the variation betweeh the teams for *any* metric is usually far more down to characteristics of the system of which all teams are part, rather than reflecting the per-team performances.

But if uppper management instead set a target for bug counts per team, they have a different view - they are implying that poorly scoring teams are working ineffectively and need to pull their socks up.

The end result of that kind of approach is typically:

If instead the bug counts are looked at collectively in an attempt to improve all-system quality, then the measurement of bug counts is very helpful.

So often this is the problem with measurement-based intervention going wrong - people not taking system effects into account. That is, you measure a bunch of teams on some metric and then intervene on the assumption that each team's metric score can be identified directly with that team's quality. A seductive idea but almost always wrong - and when it's wrong the intervention will tend not just to fail but to produce adverse effects.

-- MatthewMorris


I once worked at a telecom company that tried to base the pay of line workers on various metrics. The problem was that simple metrics were too simplistic to reflect good work, but tweaking them to make them "fairer" made them so complex that employees and managers were often haggling over interpretation and accuracy of the "incentives rule book", and often on the phone with programmers and analysts regarding the implementation of metric computations. The bottom line is that many factors are best left to subjective judgement. It may not always be fair, but the cost of fairness is probably too high to justify for many types of work because the data collection and processing of such metrics itself can be a big job. -t


Related to HeisenbergUncertaintyPrinciple, SovietShoeFactoryPrinciple


FebruaryZeroSix (this was a quick topic; it's small, but it matches the criteria)

CategoryStatistics


EditText of this page (last edited December 15, 2014) or FindPage with title or text search