Skip to main content

Metrics

The misuse or misinterpretation of metrics is a common contributor to internal model risks. Let's dive into a specific example now: someone finds a useful new metric that helps in evaluating performance.

It might be:

  • Source Lines Of Code (SLOC): i.e. the number of lines of code each developer writes per day/week whatever.
  • Function Points: the number of function points a person on the team completes, each sprint.
  • Code Coverage: the number of lines of code exercised by unit tests.
  • Response Time: the time it takes to respond to an emergency call, say, or to go from a feature request to production.
  • Release Cadence: number of releases a team performs, per month, say.

With some skill, they may be able to correlate this metric against some other more abstract measure of success. For example:

  • "quality is correlated with more releases"
  • "user-satisfaction is correlated with SLOC"
  • "revenue is correlated with response time"

Because the thing on the right is easier to measure than the thing on the left, it becomes used as a proxy (or, Map) for the thing they are really interested in (the Territory). At this point, it's easy to communicate this idea with the rest of the team, and the market value of the idea is high: it is a useful representation of reality, which is shown to be accurate at a particular point in time.

1. Metrics as a Proxy

But correlation doesn't imply causation. The cause might be different:

  • Quality and number of releases might both be down to the simplicity of the product.
  • User satisfaction and SLOC might both be down to the calibre of the developers.
  • Response time and revenue might both be down to clever team planning.

Metrics are seductive because they simplify reality and are easily communicated. But they inherently contain Internal Model Risk: by relying only on the metrics, you're not really seeing the reality.

The devil is in the detail.

2. Metrics Become Out-Dated

Just as market needs evolve over time, our behaviour evolves to incorporate new ideas. The more popular an idea is, the more people will modify their behaviour as a result of it, and the more the world will change.

In the case of metrics this is where they start being used for more than just indicators but as measures of performance or targets:

  • If a team is told to do lots of releases, they will perform lots of releases at the expense of something else.
  • If team members are promoted according to SLOC, they will make sure their code takes up as many lines as possible.
  • In the UK, ambulances were asked to wait before admitting patients to Emergency wards, in order that hospitals could meet their targets.

"Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes." - Goodhart's Law, Wikipedia

Some of this seems obvious: Of course SLOC is a terrible measure performance! We're not that stupid anymore. The problem is, it's not so much the choice of metric, but the fact that all metrics merely approximate reality with a few numbers. The map is always simpler than the territory, therefore there can be no perfect metrics.

Will the idea still be useful as the world adapts? Although the Hype Cycle model doesn't cover it, ideas and products all eventually have their day and decline in usefulness.

3. Ideas Take Time To Prove (or Disprove)

There are plenty of ideas which seem a really good idea at the time but then end up being terrible. It's only as we improve our internal model and realize the hidden risks that we stop using them. While SLOC is a minor offender, CFCs or Leaded Petrol are more significant examples. Hence, there is a "Period of Inoculation" where the population realise their mistake - there is "negative hype" as they work to phase out the offending idea until it's forgotten.

SLOC is not on its own a bad idea, but using it as a metric for developer productivity is.

"Measuring programming progress by lines of code is like measuring aircraft building progress by weight.” - Bill Gates