“How quickly can you get this done?” (estimating workload)
Epistemic status: based partly upon what I learnt as a certified ScrumMaster and mainly from practical experience running a software development team. Some of the later ideas come from Gower Handbook of Project Management, Judgement in Managerial Decision Making, and How to Measure Anything.
When asked how long something will take, you have hidden ambiguity in the question and multiple sources of error. Here we will attempt to clarify both and start to address some of these issues.
What is the question?
Let’s start with the ambiguity of the question. Lets ignore ambiguity over the definition of ‘finished’ and any errors in your estimation for now. Lets also assume that you already have a prioritised list of projects/work-packages, some of which will be in-progress, some wont have been started yet (I will take the terminology from SCRUM and call this the ‘backlog’ but it can apply to anything, not just projects as we traditionally think of them). Here are a few things that I think people actually mean when that ask “how long will this [project] take?”:
If you started this now and worked on nothing else (during your usual work hours) when would this task be completed?
If you started this project now and worked on no other projects (during your usual work hours), but still had to deal with the usual non-development work (such as meetings, firefighting and emails) when would this task be completed?
If you worked on this project as the absolute highest priority work. How many hours would you have spent on this project by the time it is finished?
Given that there are other projects that are in progress or even higher priority than this, and that some of your time is spent in non-development work. When will this project be finished?
As with 4 but realising that you are likely to add other work into the backlog, some of which will be higher priority than the project in discussion.
There is an important subtlety to the phrasing of these questions which greatly changes their answers. This is not a problem if the questioner and the estimator agree on the definition but if you wanted an answer to variant 5 but got the answer to variant 1 then there is a problem.
Lets go through an example to show how these variants differ and which is the best version in different scenarios.
Example
A customer requested a feature to be added to some software and that you are responsible for the team that is developing the software.
To work out where you should add it to the backlog (i.e. it’s priority compared to other things your team could work on) you want to know the “size” of the work, in this case you do not care whether you team has lots of meeting on at the moment or the priority of other projects—thus variant 3 is the best version in this case. Let’s say that you have a perfectly accurate and precise predicting machine that says this is 2-hours (I want to revisit this simplification in another post). But it’s important to say that number along with the understanding that you cannot actually get the work done in that time. Your team will have non-development work to do, they will have meetings, and need breaks, if you know they have to deal with “firefighting” of emergencies then this needs to be factored in too. You go back to your predicting machine and ask it: what is the current fraction of time my team can focus on development work, it replies with 50%.
Unfortunately the answer to variant-2 is not “in 4 hours time” (2h / 50%) because there will be times when you cannot progress on the project because you are waiting on something (most commonly other people). Lets say for this project you need sign-off from the marketing department roughly half-way through the work (other common reasons might be waiting for access, for information, for the results of other people’s work, etc). So we go back to our predicting machine and it says “the reply from marketing will take 3h, there will be no other delays”. This give us our answer to variant-2: “the work can be done in 8 hours, but 3 of those hours we will be waiting on others, and 2 of those will be on unrelated non-development work”. Note that by being specific when answering these questions you reduce confusion.
This still doesn’t help our customer, because we are unlikely to start the work right away. We will probably finish off the work we are on, possibly implement other higher-priority work and then start. We go back to our prediction machine which says “the projects that are in-progress or higher priority than this one (from the backlog) will be complete in 4 days time”. We add that to the 8h for the project in question and get the answer to variant-4 “assuming the backlog doesn’t change the project will be complete in 5 days”.
But we know that isn’t a realistic assumption, just like this project, we regular add new projects to the backlog and that shifts the priorities and thus the timescales for each item to be complete. We go back to our prediction machine it says “before you get started on the project in question you will add a further 6 days of development work that is higher priority than the project in question”. So you have to add this extra 6 days to the 5 days, take into account any weekends and holidays and give the final answer to variant-5, the one the customer actually cares about: “it will be 11 working days to before you can use this feature”.
So we have gone from a piece of work that will only take 2-hours to one that will not be ready for 11 working days. This is the scale of the ambiguity.
These figures are not unrealistic either—though I would be working hard as a team lead to increase the 50% development work figure and reduce or eliminate the 3h wait for marketing sign-off.
What is ‘done’/‘complete’?
Lets add one piece of complexity back in. What do you mean when you say done? Sticking with the example of a piece of software, here are a few possible parts to what you might mean by done, but there are more, and many of those have their own ambiguity:
basic functionality works on the machine of one developer: “it’s working for me”
thoroughly tested
thoroughly documented
an automatic test suite written that covers every line changed
deployed so a (/every) customer can use the feature
Again, there might be a doubling or more of workload between “working on my machine” to deployed and documented ready for customers to use.
The only point I will make on this is to recognise the ambiguity and address it upfront. But unlike the last section the definitions will depend a lot on the area you are working in.
Conclusion and Recommendations
I will leave the advice on how to improve estimation for another time (please let me know if you are interested in this) but there are a few points to take away. Firstly “how long will this take?” or “when can you get this done by?” is a tricky question even if you are a well calibrated (accurate) and precise estimator. It’s best to give your answer, not just in hours but as a longer sentence, “it’s only 2h work but we wont be done for 11 days” or more verbosely “the work can be done in 8 hours, but 3 of those hours we will be waiting on others, and 2 of those will be on unrelated non-development work. We probably wont start it for another 10 working days though”.
The standard process is scope->effort->schedule. Estimate the scope of the feature or fix required (usually by defining requirements, writing test cases, listing impacted components etc.), correct for underestimating based on past experience, evaluate the effort required, again, based on similar past efforts by the same team/person. Then and only then you can figure out the duty cycle for this project, and estimate accordingly. Then double it, because even the best people suck at estimating. Then give the range as your answer if someone presses you on it. “This will be between 2 and 4 weeks, given these assumptions. I will provide updated estimates 1 week into the project.”
I’ve given up on estimating software development tasks well. Yes, you can do interval estimates, as How to Measure Anything recommends. Yes, you can track your estimates and improve them over time. But it’s slow and few project management applications support it. (OmniPlan is the only one that works on Mac and gives you Monte Carlo simulations based on your interval estimates. But getting information on how well your estimates matched reality is still hard.)
So I’ve settled on the 80⁄20 solution, Evidence Based Scheduling. It’s implemented in FogBugz, which forecasts milestone completion using a Monte Carlo simulation based on your past estimates and how long it actually took. Which means that you make quick and dirty estimates, and out comes a probability distribution over completion dates that automatically takes into account how good you are at estimating.
They changed their pricing recently, but it should still be free for up to two users. You might have to ask the sales team.
All that said, if you have actionable guidance on how to estimate, how to get milestone completion forecasts based on the estimates, then how to judge and improve your estimation accuracy – if you have that and all in a convenient way, I’d be happy to know and adopt it.
Thanks, I’ll try to write up that post in the next couple of weeks.
In my old software dev team we got very good at estimating the time it would take to complete a single work-package (item on the backlog) but those were at most a couple of days long. What we were not very good at is the estimation of longer term progress, in that case we were in a start up and I think that was unknowable due to the speed at which we would change plans based upon feedback.
Did you get very good at estimating, because you had tracked the time on similar pieces of work before? Ie. were you doing reference class forecasting? If yes, that’s a good reminder for me. I’m familiar with the concept, but it has slipped from my mind recently.
Also, how much effort would the estimating itself take? For example, how many seconds or minutes would you be thinking about a three-hour work item?
Yes we tracked time, but only in an aggregate way. Our list of work-tasks had a very rough estimate (XS, S, M, L, XL—each being about twice the size of the previous, and XL being just more than we could complete in a 2 week period). When we came to plan our 2 weeks of work we estimated hours using ‘planning poker’ (which is a bit like the delphi method—blind estimates by each member of the team, followed by a brief discussion of the reasons for the differences, followed by one more round of blind estimates, then I [as team lead] had the final decision). At the end of the 2 weeks we would talk about each item, this sometimes involved a discussion of the amount of work relative to the estimate (either the initial e.g ‘S’ or the hours e.g. 4). In our discussions for the tasks people would regularly refer back to previous tasks as reference. We would always talk about our productivity (i.e. the size of the tasks we completed, where XS=1, S=2, M=4 …) but this was a balancing act, it would be easy to mess up incentives here.
We spent 4h every 2 weeks planning tasks, but that might involve a small discussion/argument over what should be part of each task, not just the estimation. We would also spend 2h and the end of the 2 weeks reflecting on 1) what we had made and how it impacts the development roadmap 2) things to increase MPH (motivation, productivity, happiness—I was far too pleased with myself for coming up with that acronym :P ).
Individually, I thought a lot about how to increase development speed and accuracy in estimates. But that was at least a third of my role in the latter stages, the rest being split with planning the development roadmap and doing actual development.
For a 3h task, most of the time we would spend ~2min listing to one person describe what it is. Then <1min for everyone to show their card with their estimate of the number of hours. Often that was in enough agreement that we wouldn’t do anything extra. We did have a few where one person guessed 3h and another guessed 20h, that often resulted in a 10min discussion, as there was clearly a disagreement on how to do that task ‘properly’).
Thanks! I especially like how differences of understanding were exposed when estimates diverged.
This is interesting.
Errata:
Thanks