Right. I think I agree with everything you wrote here, but here it is again in my own words:
In communicating with people, the goal isn’t to ask a hypothetically “best” question and wonder why people don’t understand or don’t respond in the “correct” way. The goal is to be understood and to share information and acquire consensus or agree on some negotiation or otherwise accomplish some task.
This means that in real communication with real people, you often need to ask different questions to different people to arrive at the same information, or phrase some statement differently for it to be understood. There shouldn’t be any surprise or paradox here. When I am discussing an engineering problem with engineers, I phrase it in the terminology that engineers will understand. When I need to communicate that same problem to upper management, I do not use the same terminology that I use with my engineers.
Likewise, there’s a difference when I’m communicating with some engineering intern or new grad right out of college, vs a senior engineer with a decade of experience. I tailor my speech for my audience.
In particular, if I asked this question to Kenoubi (“what’s the worst case for how long this thesis could take you?”), and Kenoubi replied “It never finishes”, then I would immediately follow up with the question, “Ok, considering cases when it does finish, what’s the worst-case look like?” And if that got the reply “the day before it is required to be due”, I would then start poking at “What would would cause that to occur?”.
The reason why I start with the first question is because it works for, I don’t know, 95% of people I’ve ever interacted with in my life? In my mind, it’s rational to start with a question that almost always elicits the information I care about, even if there’s some small subset of the population that will force me to choose my words as if they’re being interpreted by a Monkey’s paw.
It depends on what you mean by “didn’t work”. The study described is published in a paper only 16 pages long. We can just read it: http://web.mit.edu/curhan/www/docs/Articles/biases/67_J_Personality_and_Social_Psychology_366,_1994.pdf
First, consider the question of, “are these predictions totally useless?” This is an important question because I stand by my claim that the answer of “never” is actually totally useless due to how trivial it is.
Yep. Matches my experience.
We know that only 11% of students met their optimistic targets, and only 30% of students met their “best guess” targets. What about the pessimistic target? It turns out, 50% of the students did finish by that target. That’s not just a quirk, because it’s actually related to the distribution itself.
In other words, asking people for a best guess or an optimistic prediction results in a biased prediction that is almost always earlier than a real delivery date. On the other hand, while the pessimistic question is not more accurate (it has the same absolute error margins), it is unbiased. The reality is that the study says that people asked for a pessimistic question were equally likely to over-estimate their deadline as they were to under-estimate it. If you don’t think a question that gives you a distribution centered on the right answer is useful, I’m not sure what to tell you.
The paper actually did a number of experiments. That was just the first.
In the third experiment, the study tried to understand what people are thinking about when estimating.
This seems relevant considering that the idea of premortems or “worst case” questioning is to elicit impediments, and the project managers / engineering leads doing that questioning are intending to hear about impediments and will continue their questioning until they’ve been satisfied that the group is actually discussing that.
In the fourth experiment, the study tries to understand why it is that people don’t think about their past experiences. They discovered that just prompting people to consider past experiences was insufficient, they actually needed additional prompting to make their past experience “relevant” to their current task.
How does this compare to the first experiment?
It’s common in engineering to perform group estimates. Does the study look at that? Yep, the fifth and last experiment asks individuals to estimate the performance of others.
So observers are more pessimistic. Actually, observers are so pessimistic that you have to average it with the optimistic estimates to get an unbiased estimate.
At the end of the day, there are certain things that are known about scheduling / prediction.
In general, individuals are as wrong as they are right for any given estimate.
In general, people are overly optimistic.
But, estimates generally correlate well with actual duration—if an individual thinks something is longer in estimate than another task, it most likely is! This is why in SW sometimes estimation is not in units of time at all, but in a concept called “points”.
The larger and more nebulously scoped the task, the worse any estimates will be in absolute error.
The length of a time a task can take follows a distribution with a very long right tail—a task that takes way longer than expected can take an arbitrary amount of time, but the fastest time to complete a task is limited.
The best way to actually schedule or predict a project is to break it down into as many small component tasks as possible, identify dependencies between those tasks, and produce most likely, optimistic, and pessimistic estimates for each task, and then run a simulation for chain of dependencies to see what the expected project completion looks like. Use a Gantt chart. This is a boring answer because it’s the “learn project management” answer, and people will hate on it because
gesture vaguely to all of the projects that overrun their schedule
. There are many interesting reasons for why that happens and why I don’t think it’s a massive failure of rationality, but I’m not sure this comment is a good place to go into detail on that. The quick answer is that comical overrun of a schedule has less to do with an inability to create correct schedules from an engineering / evidence-based perspective, and much more to do with a bureaucratic or organizational refusal to accept an evidence-based schedule when a totally false but politically palatable “optimistic” schedule is preferred.