Good Metrics Use Numbers

The best metrics usually produce a number, something that can be counted or weighed. For example, even something as complex as a jet plane can be measured as follows:

Fuel efficiency - Fuel used per mile of flight per pound load.
Durability - Number of flights before plane is retired. Maintenance schedule. Repair/failure rate.
Safety - Probability of dying/injury per passenger mile flown based on crash history.
Cost - Initial purchase price. Cost of maintenance and repair per year.

Even the most complex of devices usually have numerical metrics that can be applied to measure them. When a customer wants to decide on and purchase a plane, he/may create a formula such as:

 Decision_factor = (w1 * fuel) + (w2 * durability) + (w3 * safe) + (w4 * cost)

The "w" variables are the weights given to the factors.

Note that producing numbers is generally a prerequisite for a good metric, not a sufficient property by itself.

Of course, this is an oversimplification. For one thing, the above function need not be linear (the customer may well decide that a plane above a certain cost is unacceptable no matter how good it otherwise is); for another, determining the coefficients requires judgment.

{Agreed, it is an oversimplification. Note that even though there may be disagreement over the weights, weighting does act as a kind of documentation of perceived importance. Further, different party's weights can be used to provide alternative results for comparing. If two parties use different weights, but the conclusion is still the same, then that may result in happy agreement.}

Note that there is sometimes disagreement over the merit of a given metric and/or its weighting. Thus, it is best to have multiple metrics for a given item.

-- top

Of course, some metrics (despite being quantifiable) are difficult to measure (or the measurement techniques contain a lot of error). Other metrics may be easy to measure, but have poor correlation to the inherent parameter of merit that they intend to reflect (use of LinesOfCode as a metric of programmer productivity).

SovietShoeFactoryPrinciple

Being difficult to measure does not necessarily mean it is a factor to be ignored. One may have to say something like, "I agree it improves A, but I would like to see the numbers from factor B before I make a decision." If there is no time or budget to obtain B data, then one may have to make a decision based on what is available. that is just the nature of life. In short, the cost of obtaining a given metric may not necessarily relate to its importance (weighting factor).

[Folks, use of the term metrics implies the use of numbers, since metrics involves measuring. One uses numbers to measure. How else can you measure? How else can you report differences?]

I generally consider "metrics" to be about measuring something. There are qualitative ways to measure also. For example, the movie rating system (G, PG, PG-13, R, X).

Assigning a quality is not measurement. A PG rating isn't a measurement, it's not a discoverable property of the movie. It's an exogenous imposition or opinion. Is an R-rated movie worth two PGs?

Attributions without numbers that can be meaningfully used in calculations are not measurements. Whether they are metrics or not is a linguistic choice since, in this context, "metric" is a neologism. --MarcThibault

Remember, measuring is easy. It is in performing analysis based on the measurements that the problems arise.

Disagree that measuring is easy. Most often the difficult part is in establishing criteria for the quanta to be measured. Use the movie rating system as an example: a group of professionals put their heads together and (subjectively) weigh certain aspects of a film to come up with a single rank from an enumerated list. They may use some hard factors, such as any use of words on a list immediately put the film into PG-13, or any image of certain body parts immediately move the film into R, others into X.

The real difficulty lies in setting measurements for those factors that aren't so easy to define. The well-known example is that of determining the difference between "art" and "pornography." For most of us, it's awfully hard to define that difference -- but we sure as anything know it when we see it.

It can be both. As far as movie ratings, they are still useful for comparative analysis of opinions. For example, are Californians more sensitive to violence and less sensitive to cussing than Texans? The ratings given by different groups can be used to test such hypotheses.

Hmm. That's a good bit of analytical questioning. Without spending too much time diverging off on a rabbit track let's note that films are usually released to a national audience, not just in one state or another. The rating system has to work all over the country (AmericanCulturalAssumption). The ratings board therefore has to use certain hard and fast rules about language, violence, skin, iconic references, etc. This is because there is a kind of lowest common denominator they have to deal with, thus the set rules.

By that reasoning we can see that any flick showing an erect penis has to be considered pornography, without exception. Any film showing a woman being struck in the face gets an R rating, without fail. Any movie containing any of the seven naughty words gets a PG-13 or whatever rating, other content notwithstanding. And so it goes.

This same approach is useful for determining the numbers used in collecting metrics on "soft" criteria; that is, the relatively subjective matters that folks can't agree upon. By setting a floor, a ceiling, and in-between steps for performance or behavior of a system the development crew can determine if a product meets requirements or not and produce metrics to prove that. Also, those metrics are useful in determining which of the "soft" characteristics are causing the bulk of the problems. That way the dev team can work with customers, marketing, or whomever they need to hash out changes to the measurements or the criteria itself.

I was once in a heated debate with my boss. Yes, bickering with boss is poor form, but I'm human and it happened in the heat of a big project. Let's call the boss "Bob" here. In the middle of the conversation, Bob said, "Stop interrupting me! That's rude." I replied, "But you keep accusing me of different things and I don't get a chance to defend myself." Bob then said, "Do that on your turn". I replied, "But, you don't give me a turn." Bob replied, "Yes I do."

Then I said, "So far you've spoken 5 words for every 1 of my words, even with my interruptions." Bob was about the say something, but then paused for about 4 seconds. You could see him mentally calculating by the expression on his face. He then half-changed the subject, something like, "Go ahead; finish your point of view. I'm listening." I doubt Bob agreed with the 5/1 figure, perhaps thinking it more like 2/1, but giving a number gave Bob something more solid to compare that illustrated my point. The conversation was lopsided enough that the quantity difference was pretty obvious and undeniable. (I doubt that improved my standing with Bob, but it was satisfying in the short-term.)

Consider renaming to GoodMetricsProduceNumbers.

CategoryMetrics, CategoryScience