Scott wrote that “On February 20th, Tetlock’s superforecasters predicted only a 3% chance that there would be 200,000+ coronavirus cases a month later (there were).” I just want to note that while this was indeed a very failed prediction, in a sense the supers were wrong by just two days. (WHO-counted cases only reached >200k on March 18th, two days before question close.)
One interesting pre-coronavirus probabilistic forecast of global pandemic odds is this: From 2016 through Jan 1st 2020, Metaculus users made forecasts about whether there would be a large pandemic (≥100M infections or ≥10M deaths in a 12mo period) by 2026. For most of the question’s history, the median forecast was 10%-25%, and the special Metaculus aggregated forecast was around 35%. At first this sounded high to me, but then someone pointed out that 4 pandemics from the previous 100 years qualified (I didn’t double-check this), suggesting a base rate of 40% chance per decade. So the median and aggregated forecasts on Metaculus were actually lower than the naive base rate (maybe by accident, or maybe forecasters adjusted downward because we have better surveillance and mitigation tools today?), but I’m guessing still higher than the probabilities that would’ve been given by most policymakers and journalists if they were in the habit of making quantified falsifiable forecasts. Moreover, using the Tetlockian strategy of just predicting the naive base rate with minimal adjustment would’ve yielded an even more impressive in-advance prediction of the coronavirus pandemic.
More generally, the research on probabilistic forecasting makes me suspect that prediction polls/markets with highly-selected participants (e.g. via GJI or HyperMind), or perhaps even those without highly-selected participants (e.g. via GJO or Metaculus), could achieve pretty good calibration (though not necessarily resolution) on high-stakes questions (e.g. about low-probability global risks) with 2-10 year time horizons, though this has not yet been checked.
Nice write-up!
A few thoughts re: Scott Alexander & Rob Wiblin on prediction.
Scott wrote that “On February 20th, Tetlock’s superforecasters predicted only a 3% chance that there would be 200,000+ coronavirus cases a month later (there were).” I just want to note that while this was indeed a very failed prediction, in a sense the supers were wrong by just two days. (WHO-counted cases only reached >200k on March 18th, two days before question close.)
One interesting pre-coronavirus probabilistic forecast of global pandemic odds is this: From 2016 through Jan 1st 2020, Metaculus users made forecasts about whether there would be a large pandemic (≥100M infections or ≥10M deaths in a 12mo period) by 2026. For most of the question’s history, the median forecast was 10%-25%, and the special Metaculus aggregated forecast was around 35%. At first this sounded high to me, but then someone pointed out that 4 pandemics from the previous 100 years qualified (I didn’t double-check this), suggesting a base rate of 40% chance per decade. So the median and aggregated forecasts on Metaculus were actually lower than the naive base rate (maybe by accident, or maybe forecasters adjusted downward because we have better surveillance and mitigation tools today?), but I’m guessing still higher than the probabilities that would’ve been given by most policymakers and journalists if they were in the habit of making quantified falsifiable forecasts. Moreover, using the Tetlockian strategy of just predicting the naive base rate with minimal adjustment would’ve yielded an even more impressive in-advance prediction of the coronavirus pandemic.
More generally, the research on probabilistic forecasting makes me suspect that prediction polls/markets with highly-selected participants (e.g. via GJI or HyperMind), or perhaps even those without highly-selected participants (e.g. via GJO or Metaculus), could achieve pretty good calibration (though not necessarily resolution) on high-stakes questions (e.g. about low-probability global risks) with 2-10 year time horizons, though this has not yet been checked.