[Link] How to see into the future (Financial Times)
How to see into the future, by Tim Harford
The article may be gated. (I have a subscription through my school.)
It is mainly about two things: the differing approaches to forecasting taken by Irving Fisher, John Maynard Keynes, and Roger Babson; and Philip Tetlock’s Good Judgment Project.
Key paragraph:
So what is the secret of looking into the future? Initial results from the Good Judgment Project suggest the following approaches. First, some basic training in probabilistic reasoning helps to produce better forecasts. Second, teams of good forecasters produce better results than good forecasters working alone. Third, actively open-minded people prosper as forecasters.
But the Good Judgment Project also hints at why so many experts are such terrible forecasters. It’s not so much that they lack training, teamwork and open-mindedness – although some of these qualities are in shorter supply than others. It’s that most forecasters aren’t actually seriously and single-mindedly trying to see into the future. If they were, they’d keep score and try to improve their predictions based on past errors. They don’t.
- 6 Jun 2015 18:14 UTC; 2 points) 's comment on Summary of my Participation in the Good Judgment Project by (
So… all l hail gwern with 1700+ well-calibrated predictions?
Of course, they’re trying to sell their forecasting services.
Would an “expert” want to keep data that could potentially refute his claim to expertise?
If there would be a general belief that experts who don’t are bad at forecasting he might keep data to signal that he follows best practices.
It’s just a better of spreading the knowledge that people who don’t keep score don’t make good predictions.
Entirely true.
General beliefs are generally ignorant nonsense, particularly with regard to mathematical abstractions on aggregates that people have no concrete experience dealing with themselves.
Fun fact from Ian Hacking, via Wikipedia:
This was quite an eye opener to me when I first saw it. We take empirical testing and verification for granted, but if you were just unaware of it, what else go by to determine the “truth” of something? Scientific method is obviously not in the genes, but I bet you that obedience to authority is, even in “truth”.
Experts still generally thrive on authority, not empirically demonstrated competence.
Your average bald monkey is just not that clever.
Even if general beliefs are not informed, imaging a world in with MBA programs would teach the students that good experts make testable predictions. Give it a decade and having predictions is a new management fashion.
So we need to engineer systems that gives them that experience.
I am not a historian of science and it might be “just-so story”, but it was my understanding that one of the reasons Galileo’s telescope was so important in the history of science was that it (and other scientific instruments) made it possible to challenge a theory without necessarily having to challenge the authority of the creator of that theory. If people share the same senses which are of approximately the same quality (except, obviously, people who are blind, deaf, etc.), then the defining reason why some people come up with good theories and other don’t is their intellect (and thus, authority, since, by halo effect, people who have more authority are generally thought to be more intelligent) and maybe having access to some esoteric knowledge, revelations (the concept of cumulative progress (“standing on the shoulders of giants”) is not necessarily helpful for challenging the established theory) which are rare.
So when one tries e.g. to challenge an idea by Aristotle, unless the falsehood of this idea is demonstrated easily and cheaply, all the listeners (who are forced to take an outside view) can do is compare who of you two was more likely to make a faulty reasoning or faulty observation, i.e. comparing intellect and qualifications (and therefore, authority), and the the followers of Aristotle can point to his large and impressive body of work as well as him being highly respected by all other authority figures. On the other hand if one has a telescope (or any other scientific instrument that enables one to extend one’s senses) the assumption of everyone having equal senses is broken and it is no longer necessary to engage in “who is more intelligent and wise” fight, one can simply point out that you have a telescope and this is the reason why your discovery (that contradicts a respected position) might nevertheless be correct. One could even make a polite deference to the authority (“X was an extraordinary genius, just imagine what he could have done if he hadall the equipment we have today—possibly much more exciting things than we are currently able to”) and still claim to be more correct than them.
When more and more arguments are won by pointing out to these “extended senses”, we gradually see the shift of authority in the observations from the eminence of theory creators to the quality of lab equipment.
It is important to note that innovations in methodology (e.g. calculating probabilities) seem more similar to “tools/algorithms” rather than “intellect”, since the whole point of having and following a certain methodology at all is to avoid necessarily having to be genius to make a discovery.
However, at any given moment in time, in any given area, most reseachers still use basically the same equipment, thereby restoring the approximate equality of everyone’s senses. Therefore even today, when scientists obtain different result using similar (or at least comparable in quality) equipment, people start making claims about who has (and who has not) relevant qualifications. At the same time, we see a lot of theories in astronomy and astrophysics being overturned whenever a new, larger and better telescope becomes available.
I admit, this was mostly about people who take an outside view, and the experts in sciences and/or those who are actually interested in making correct predictions about the world. Many people who are often said to be experts aren’t actually trying and have some different goals instead.
I don’t think it was the case of “I have a telescope ergo I am correct”, I think it was more of the case “Here, look into this thing and see for yourself”.
I was mostly trying to talk about an “outside view”, i.e. whom should a layman (who is not necessarily able to immediately replicate an experiment himself/herself) believe?
Suppose an acclaimed professor (in earlier times—a famous natural philosopher) and a grad student (or an equivalent in earlier times) are are trying to figure out something and their respective experiments produce differing results. Suppose their equipment was of the same quality. Whom being correct should a layperson bet on before further research becomes available? Would even the grad student himself/herself be confident in his/her result? Now suppose the grad student had access to a significantly better and more modern tools (such as a telescope in the early 1600s or an MRI scanner in 1970s, etc.). The situation changes completely. If the difference between the quality of lab equipment would be sufficiently large (e.g. CERN vs an average high school lab) nobody would even bother to do a replication. (by the way, given equipment of the same quality (e.g. only senses), if the difference in authority is sufficiently large, would the situation be analogous? I’m not sure I can answer this question).
A more mundane situation. Suppose a child claims there is some weird object in the sky that they saw with their naked eye. Then others would ask, why hadn’t others (whose eyes are even better) seen it before? Why hadn’t others (who are potentially more intelligent) identified it? Now suppose the said child has a telescope. Even if others would not bother to look at the sky themselves, they would be much more likely to believe that he/she could have actually seen something that was real.
In no way am I trying to downplay the importance of replications and, especially, cheap replications, such as allowing everybody to look through your telescope. (which, in addition to being a good replication of that particular observation, also serves a somewhat more general purpose—people have to believe that you really do possess “extended sense” instead of just making it up (like many self-proclaimed psychics do)). The ability to replicate cheap experiments is crucial. As well as the fact that (in the ideal world, if not necessarily the real one) there are people in the world who have the means necessary to replicate difficult and expensive ones, and the willingness (and/or incentives) to actually do so and honestly point out whatever discrepancies they may find.
It seems necessary to point out that this is probably just a “just-so story”, an actual historian of science probably could make a much more informed comment whether the process I described was of any importance at all.
Anyway, this conversation seems to have strayed a bit off topic and now barely touches the Financial Times article.
Does anyone here read Thomas Frey’s work?
Here is an application for consideration. I’m not a software developer, but I get to specify the requirements for software that a team develops. (I’d be the “business owner” or “product owner” depending on the lingo.) The agile+scrum approach to software development notionally assigns points to each “story” (meaning approximately a task that a software user wants to accomplish). The team assigns the points ahead of time, so it I a forecast of how much effort will be required. Notionally, these can be used for forecasting. The problem that I have encountered is that the software developers don’t really see the forecasting benefit, so don’t embrace it fully. For examples in my experience, they don’t (1) focus as much in their “retrospectives” (internal meetings after finishing software) about why forecasts were wrong or (2) assign points to every single story that is written, to allow others to use their knowledge. They are satisfied if their ability to forecast is good enough to tell them how much work to take on in the current “sprint” (a set period of usually 2 or 3 weeks during which they work on the stories that are pulled into that sprint), which requires only an approximation of the work that might go into the current sprint.
Some teams tend to use it as a performance measure, so that they feel better if they increase the number of points produced per sprint over time (which they call “velocity”). Making increased velocity into a goal means that the developers have an incentive for point inflation. I think that use makes the approach less valuable for forecasting.
I believe there are a number of software developers in these forums. What is the inside perspective?
Max L.
My inside perspective is that story points aren’t an estimate of how much time a developer expects to spend on a story (we have separate time estimates), but an effort to quantify the composite difficulty of the story into some sort of “size” for non-developers (or devs not familiar with the project). As such it’s more of an exercise in expectation management than forecasting.
This is something i have tried explaining multiple times, but I cant really say that i understand the point. It’s harder, so it takes longer, right? My response is that it is a combination of time to complete and probability that the estimate is wrong and it takes a lot longer. But it seems to me that it would be better to decompose those aspects. The benefit of putting it in one number is that it is easier ti use to manage expectations. It’s like giving an estimate that is higher than your true estimate based on risk. Frequently, you should end up with spare time that you use to offset the reputational impact of totally missing sometimes. From a manager’s perspective, it looks a bit like padding the estimates systematically, to offset all the biases in the system towards only hearing the earliest time possible.
Max L
My inside perspective from using this system is that, at least the way we use it, it is not useful for forecasting. In each sprint approximately 40-50% or so of the tasks actually get finished. Most of the rest we carry over, and occasionally a few will be deprioritized and removed.
The points values are not used very systematically. Some items aren’t even assigned a point value. For those that are, the values do generally tend to correspond to amounts of effort of ‘a day’, ‘a week’, or ‘a month’, but not with very much precision.
We certainly don’t focus on whether or not our points predictions were accurate to the amount of time actually taken during retrospectives. I think that in the past 12 months, all but one or two of our retrospectives have ended up being cancelled, because they aren’t seen as very important.
Probably we are not utilizing this system to its fullest capacity, but our group isn’t dysfuncitonal or anything. The system seems to work pretty well as a management tool for us (but is not so useful for forecasting!).