Fill-The-Bucket

If it takes one hour to fill a 5l bucket...

The simplest type of estimation problem we might face in software development is simple extrapolation, as in the example above. If one developer takes one day to build a web-page for our site, how long will it take to build ten web-pages? Well, it could be that the answer is around ten days.

We'll name this "estimation by extrapolation" approach "Fill-The-Bucket". It occurs in maths tests everywhere: It takes an hour to paint one fence panel. There are 40 fence panels. How long will the whole fence take?

The key to Fill-The-Bucket style estimation is that:

The work can be measured in units.
Each unit is pretty much the same as another.
Each unit is independent to the others.

This is our starting-off point. It should be pretty clear that these assertions don't hold true for a lot of software development, but let's examine this style of estimating anyway.

Distribution

In reality, we should expect that different fence panels take slightly different amounts of time to paint. Perhaps one is in an awkward position compared to another, or the wood is of different quality, or the painter is more or less motivated.

Also, we shouldn't expect measurement in the real world to ever be exact, we are always going to see a distribution of times.

Where we are able to see measurements clustering-around-the-mean, this gives rise to the familiar Normal Distribution of measurements - the so called "Bell Curve" - which arises out of distributions like height, weight, test scores and so on.

Mean and Variance

Normal distributions have two independent parameters: the mean (the highest, middle point) and the variance (a measure of the spread). You can play with both these parameters in the simulation above.

You can fairly easily add up normal distributions like this. If you have n fence panels to paint, with m as the mean time to paint each panel, and v as the variance, then:

The mean over all n fence panels is n x m.
The new variance is n x v.

Probability Density

The area under those curves above is the probability density.

When you paint any given fence panel (the first, red chart), you'd expect the time taken to be a single spot from under the curve, picked at random. Given that there is more area under the curve around the mean, we'd expect our fence-painting times to be clustered around the mean.

The second, blue chart extrapolates the single panel density to show how long the whole job will take. It the variance for a single panel is large and the number of panels painted is large, then the time to paint the whole fence could vary by hours.

The area of a probability density curve is normalised to 1, so you can pick a particular interval within it, work out the area of it, and that's the probability of an event occurring within that interval.

Sampling Error

If you paint the first fence panel in 40 minutes, how sure can you be that this is a good estimate? What if you extrapolate from this single fence panel? To paint all 40 might take 26 hours. How confident are we of this estimate?

After the first fence panel, you just don't know. After you've painted two or three, you can start to figure out the sample variance $s^2$ :

s^2 = \frac{\sum(x - \bar{x})^2}{n - 1}

The more samples we make, the more precise the sample variance will be, and so the more confident we should be on our expected time to complete painting the fence.

In the above simulation, we are trying to fit a Normal Distribution, estimated from a number of samples.

You should be able to see that when you move from two to three samples, the variance will probably change a lot. However moving from twenty to thirty samples means it hardly changes at all.

This kind of measurement and estimating is the bread-and-butter of all kinds of Operational Control systems.

Big-O

Although software development tasks don't often fit into the Fill-The-Bucket domain, lots of things in data processing do. When talking about algorithms, we say fence-panel painting is $O(n)$ . That is, the number of operations taken to complete the job is a linear function n, the number of fence panels.

The same is true for lots of other algorithms - scanning a linked-list, walking a tree, these are often $O(n)$ .

Binary Chop.

There are plenty of algorithms too which have other efficiencies. Let's say you use this algorithm to look up a word in a dictionary.

Establish upper and lower bounds of the search space (i.e. first and last entry of the dictionary)
Find a word about half-way between the two. Is the word you're looking for before or after this word, or exactly this word? If the latter, you're done, otherwise, revise either the upper or lower bound to this word and repeat.

This is the binary chop algorithm, in which the number of remaining search-space halves each time you go round step 2. Therefore, doubling the length of the dictionary only increases the number of operations by 1. So this algorithm takes $O(log_2 n)$ time.

So Fill-The-Bucket is still an appropriate way of estimating for these algorithms. If you can figure out how long it takes to do steps 1 & 2, and how many times it'll have to do them, you can make a good estimate of the total time. That is, even though the time won't be linear, extrapolation still works.

Estimating Risk

Let's say we have a problem in the Fill-The-Bucket domain. How can we use this to estimate risk?

Let's set up a simple scenario, which we've agreed by contract with a client:

The client will pay us £10,000 to process 500 client records.
The client wants the records completed in 20 days. We can agree extra time in advance, but this costs £300 per day from the contracted price.
If we miss our delivery date, we pay a penalty of £1,000 per day until the project is complete.
It takes 1-3 hours to process a client record, and we have 4 staff working 8 hours per day. Let's model this with a mean of 2 hours and variance of 1 hour.

Let's ignore all other risks and just focus on these monetary ones. What is the best time to suggest to the client?

Analysis

There are three charts above:

The top (red) chart is showing the probability density for us completing the work. Our actual completion time is one point chosen randomly from the area in red. So, we're probably looking at around 32 days.
The middle (blue) chart shows our return distribution. As you can see, it starts sliding down after 20 days, eventually ending up in negative territory. Leaving the estimate at 20 days gives us the highest possible payout of £10,000, increasing our estimate reduces this maximum.
The bottom (orange) chart multiplies these two together to give us a measure of financial risk. Without adjusting the estimate, we're more likely to lose than win.

Are you a gambler? If you can just make everyone work a couple of extra hours' overtime, you'll be much more likely to make the big bucks. But without cheating like this, it's probably best to give an estimate around 30 days or more.

Meta-Analysis

This is a really contrived example, but actually this represents most of how banks, insurance companies, investors etc. work out risk, simply multiplying the probability of something happening by what is lost when it does happen. But let's look at some criticisms of this:

Aren't there other options? We might be able to work nights to get the project done, or hire more staff, or give bonuses for overtime or something. In fact, in Pressure we'll look at some of these factors.
We've actually got a project here which degrades gracefully. The costs of taking longer are clearly sign-posted in advance. In reality, the costs of missing a date might be much more disastrous: not getting your game completed for Christmas, missing a regulatory deadline, not being ready for an important demo - these are all-or-nothing outcomes where it's a stark contrast between in-time and missing-the-bus.
Software development isn't generally isn't like this - as we will explore in the following sections, software development is not in the Fill-The-Bucket domain, generally.

Failure Modes

The problem is, because this approach works well in insurance and operations and other places, there is a strong tendency for project managers to want to apply it to software development.

But there are lots of ways Fill-The-Bucket goes wrong, and this happens when you are estimating in scenarios that violate the original conditions:

The work can be measured in units.
Each unit is pretty much the same as another.
Each unit is independent to the others.

In the financial crisis, we saw how estimates of risk failed because they violated point 3.

Let's have a look at what happens when we relax these constraints.

Fill-The-Bucket

Distribution​

Mean and Variance​

Probability Density​

Sampling Error​

Big-O​

Binary Chop.​

Estimating Risk​

Analysis​

Meta-Analysis​

Failure Modes​