This is the first post in a little series I’m slowly writing on how I see forecasting, particularly conditional forecasting; what it’s good for; and whether we should expect people to agree if they just talk to each other enough.
Views are my own. I work at the Forecasting Research Institute (FRI), I forecast with the Samotsvety group, and to the extent that I have formal training in this stuff, it’s mostly from studying and collaborating with Leonard Smith, a chaos specialist.
My current plan is:
Forecasting: the way I think about it [this post]
The promise of conditional forecasting / cruxing for parameterizing our models of the world
What we’re looking at and what we’re paying attention to (Or: why we shouldn’t expect people to agree today (Or: there is no “true” probability))
What do I do when I forecast? Let’s say I’m forecasting an arbitrary bad outcome U that we’re going to resolve in/by 2100 (e.g. AI-related catastrophe). I ask myself:
What are all the worlds I can imagine for 2100?
What’s my P(U) in each of these possible worlds?
Take the value for P(U) that has half the probability mass below and half above – that may not be the modal world – it’s the median world, and it’s where my expected log score is maximized
Imagining all the worlds is impossible, so I wind up decomposing the probability mass function into a few types of worlds and thinking about how being in each world would affect P(U) — i.e. for worlds A, B, C, … I have P(U|A), P(U|B), P(U|C) etc (Fig. 2). And I have ideas about how likely we are to wind up in each of A, B, C, etc. Here, B is my “modal world” and my “expectation” world is somewhere between C and D on the P(U) scale.
If you want to get really fancy, you can factor in uncertainty about U in each of these worlds, treat them all as distributions (some are pointier, some are more uncertain), and think about your all-things-considered P(U) as a mixed distribution of all of your worlds. This can always be distilled into a point estimate by taking that center of mass (dotted line in Fig. 1). You can use tools like squiggle for this.
Side-note: I think some people just think about the modal world B by default. It’s probably the first world you think of. It’s the world you most think will come to pass. But you don’t maximize your log score by forecasting P(U|B) when you’re asked for P(U).
In our projects at FRI, we’ve conditioned on things of the form “[x happens] by [year].” Let’s say [x happens] is a certain policy being implemented. Understandably, our study participants have factored in what “this policy” being implemented may imply about the world in [year]. Maybe you think it highly unlikely that this policy would be implemented if we were living in World B, so conditioning on it makes you think we’re probably in World F, where Russia has nuked the UK and there are dragons. Conditioning on any given thing changes the shape of your curve. Now it might look something like this:
This is a problem if what we want to know is how the policy would causally affect the ultimate outcome that we care about. Can we say whether this policy would be good or bad (measured by its impact on P(U))? Not really. But if you ask a forecaster to “hold all else equal” and try to isolate just the effect of the policy, I’d argue that they’re hardly forecasting anymore. Any forecast generated that way can’t be scored. Worlds A, B, C, etc could manifest, whereas the world where nothing happens except this policy is implemented isn’t realizable. In fact, this is a fallacy that Adam Dorr has written about, ceteris paribus: when you’re forecasting, it’s a mistake to imagine “single-variable futures” (h/t Michał Dubrawski, without whom I probably wouldn’t have read Dorr).
If only we had a way to capture how much of my forecast owes to “evidential” considerations like P(B|policy) and how much is more like causal reasoning! We need better ways for people to articulate their models of the world and what they’re weighing in their forecasts. Dan Schwarz has written about that need here. I have some thoughts I’ll share in my next post.
Keen to hear how different this is from how you, dear reader, think about forecasting.
Good points well made. I’m not sure what you mean by “my expected log score is maximized” (and would like to know), but in any case it’s probably your average world rather than your median world that does it?
Figure 1 is clumsy, sorry. In the case of a smooth probability distribution of infinite worlds, I think the median and the average world are the same? But in practice, yes, it’s an expected value calculation, summing P(world) * P(U|world) for all the worlds you’ve thought about.
In Fig 1, is the vertical axis P(world) ?
Good q, yes, that’s the vertical axis in all the figures.