The simplest type of estimation problem we might face in software development is linear, as in the example above. If one developer takes one day to build a web-page for our site, how long will it take to build ten web-pages? Well, it could be that the answer is around ten days.
You can apply this estimation approach to any linear or linear-ish problem: It takes an hour to paint one fence panel. There are 40 fence panels. How long will the whole fence take?
The key to Fill-The-Bucket style estimation is that:
- The work can be measured in units.
- Each unit is pretty much the same as another.
- Each unit is independent to the others.
This is our starting-off point. It should be pretty clear that these assertions don’t hold true for a lot of software development, but let’s examine this style of estimating anyway.
In reality, we should expect that different fence panels take slightly different amounts of time to paint. Perhaps one is in an awkward position compared to another, or the wood is of different quality, or the painter is more or less motivated.
Also, we shouldn’t expect measurement in the real world to ever be exact, we are always going to see a distribution of times.
Where we are able to see measurements clustering-around-the-mean, this gives rise to a Gaussian (or Normal) distribution of measurements.
You can fairly easily add up normal distributions like this. If you have n fence panels to paint, with m as the mean time to paint each panel, and v as the variance, then:
- The mean over all n fence panels is n x m.
- The new variance is n x v.
This is what is going on in the above graphs. The area under each curve is the probability distribution. When you paint any given fence panel (the first, red graph), you’d expect the time taken to be a single spot from under the graph, picked at random. Given that there is more area under the graph around the mean, we’d expect our fence-painting times to be clustered around the mean.
The second, blue graph extrapolates the single panel distribution to show how long the whole job will take. It the variance for a single panel is large and the number of panels painted is large, then the time to paint the whole fence could vary by hours.
If you paint the first fence panel in 40 minutes, how sure can you be that this is a good mean? What if you extrapolate from this single fence panel? To paint all 40 might now only take 26 hours - which is a good deal shorter than the original estimate of 40 hours. Is that fair?
After the first fence panel, you just don’t know. After you’ve painted two or three, you can start to figure out the sample variance, which looks like this:
– (sample variance formula)
The more samples we make, the more precise the sample variance will be, and so the more confident we should be on our expected time to complete painting the fence.
You should be able to see this with the above simulation: when you move from two to three samples, the variance (the shape of the bell-curve) will probably change a lot. However moving from twenty to thirty samples means it hardly changes at all.
This kind of measurement and estimating is the bread-and-butter of all kinds of Operational Control systems.
Although software development tasks don’t often fit into the Fill-The-Bucket domain, lots of things in data processing do. When talking about algorithms, we say fence-panel painting is O(n). That is, the number of operations taken to complete the job is a linear function n, the number of fence panels.
The same is true for lots of other algorithms - scanning a linked-list, walking a tree, these are often O(n).
There are plenty of algorithms too which have other efficiencies. Let’s say you use this algorithm to look up a word in a dictionary.
- Establish upper and lower bounds of the search space (i.e. first and last entry of the dictionary)
- Find a word about half-way between the two. Is the word you’re looking for before or after this word, or exactly this word? If the latter, you’re done, otherwise, revise either the upper or lower bound to this word and repeat.
This is the binary chop algorithm, in which the number of remaining search-space halves each time you go round step 2. Therefore, doubling the length of the dictionary only increases the number of operations by 1. So this algorithm takes O(lg2 n) time.
So Fill-The-Bucket is still an appropriate way of estimating for these algorithms. If you can figure out how long it takes to do steps 1 & 2, and how many times it’ll have to do them, you can make a good estimate of the total time.
Estimating With Risk
Let’s say we have a problem in the Fill-The-Bucket domain. How can we use this with respect to risk?
Let’s set up a simple scenario, which we’ve agreed by contract with a client:
- The client will pay us £10,000 to process 500 client records.
- The client wants the records completed in 20 days. We can agree extra time in advance, but this costs £300 per day from the contracted price.
- If we miss our delivery date, we pay a penalty of £1,000 per day until the project is complete.
- It takes 1-3 hours to process a client record, and we have 4 staff working 8 hours per day. Let’s model this with a mean of 2 hours and standard deviation of 1 hour.
Let’s ignore all other risks and just focus on these monetary ones. What is the best time to suggest to the client?
There are three graphs above:
- The top (red) graph is showing the probability distribution function for us completing the work. Our actual completion time is one point chosen randomly from the area in red. So, we’re probably looking at around 32 days.
- The middle (blue) graph shows our return. As you can see, it starts sliding down after 20 days, eventually ending up in negative territory. Leaving the estimate at 20 days gives us the highest possible payout of £10,000, increasing our estimate reduces this maximum.
- The bottom (orange) graph multiplies these two together to give us a measure of monetary risk. Without doing anything else, we’re more likely to lose than win.
Are you a gambler? If you can just make everyone work a couple of extra hours’ overtime, you’ll be much more likely to make the big bucks. But without cheating like this, it’s probably best to give an estimate around 30 days or more.
This is a really contrived example, but actually this represents most of how banks work out risk, simply multiplying the probability of something happening by what is lost when it does happen. But let’s look at some criticisms of this:
Aren’t there other options? We might be able to work nights to get the project done, or hire more staff, or give bonuses for overtime or something. In fact, in Pressure we’ll come back and look at some of these factors.
Second, we’ve actually got a project here which degrades gracefully. The costs of taking longer are clearly sign-posted in advance. In reality, the costs of missing a date might be much more disastrous: not getting your game completed for Christmas, missing a regulatory deadline, not being ready for an important demo - these are all-or-nothing outcomes where it’s a stark contrast between in-time and missing-the-bus.
Third, software development isn’t generally isn’t like this - as we will explore in the following sections, software development is not in the Fill-The-Bucket domain, generally.
The problem is, because this approach works so well in banking and operations and other places, there is a strong tendency for project managers to want to apply it to software anyway.
A Better Understanding Of Risk
Risk “feeds back” into the estimation process in some unusual ways. Let’s change the model slightly.
- The client will pay us £10,000 to process 500 client records.
- The client wants the records completed in 20 days. And that’s it.
- If we hit the delivery date, great. Otherwise, within 25 days there’s a massive argument and annoyance but we get paid eventually anyway.
- It takes 1-3 hours to process a client record, and we have 3 staff working 8 hours per day. Let’s model this with a mean of 2 hours and standard deviation of 1 hour.
Suddenly, the choice is no longer a sliding scale: we don’t have control of the estimate anymore. Either we accept the risk of the work, or we don’t. Which should we do? What does it depend on, now?
In their research they asked developers split into three groups (A, B and Control) to give individual estimates on how long a piece of software would take to build. They were each given the same specification. However:
- Group A was given the hint: “I admit I have no experience with software, but I guess it will take about two months to finish”.
- Group B were given the same hint, except with 20 months.
How long would members of each group estimate the work to take? The results were startling. On average,
- Group A estimated 5.1 months.
- The Control Group estimated 7.8 months.
- Group B estimated 15.4 months.
The anchor mattered more than experience, how formal the estimation method, or anything else.
What is the reason for this? Somehow, the expectation perverts the estimate. Why would developers be influenced by this expectation so much? Here are some possible reasons:
- They want the work
- They believe their own estimates to be worse than average
- They don’t want to upset the client
Even in a Fill-The-Bucket domain, estimates can be easily corrupted by outside influences. Effectively, the estimate itself is a risk-management tool.
In Estimates we said that the main (good) reason for estimating is:
“To allow for the creation of events. As we saw in Deadline Risk, if we can put a date on something, we can mitigate lots of Coordination Risk. Having a release date for a product allows whole teams of people to coordinate their activities in ways that hugely reduce the need for Communication. “Attack at dawn” allows disparate army units to avoid the Coordination Risk inherent in “attack on my signal”. This is a good reason for estimating because by using events you are mitigating Coordination Risk. This is often called a hard deadline.” – Estimates, Risk First
But here, we’ve seen that the long-term benefits of good estimates are sacrificed for the short-term gain of a contract won or a client impressed.
Estimating as a technique then is already suspect, even within Fill-The-Bucket domain. However, as all developers are painfully aware, building software is not like Fill-The-Bucket.
Let’s have a look at how things get a lot worse.
Add Your Star On GitHub to receive an invite to the GitHub Risk-First team.