The Promises and Pitfalls of Long-Term Forecasting

Disclaimer: We will be speaking at the Manifest Conference in Berkeley, CA (Sept. 22-24) about long-term forecasting, its promises, and its pitfalls. Below is an excerpt from last week’s edition of our newsletter, Predictions, detailing some of what we plan to speak about at the event. This week’s edition can be found here.

If you are reading this newsletter, you probably have a certain amount of intellectual buy-in on the concept of quantified forecasting. However, much of the social sciences remain skeptical, to say the least, about the practice.

Take the domain of international relations, where Philip Tetlock got his start and serves as the basis for much of the quantified forecasting research. Many scholars have raised objections about whether or not we can prediction international relations which can be best captured by Robert Jervis’ System Effects published in 1997.

System Effects

The essence of the book (which should probably get its own dedicated post at some point) is that the realm of international politics deals with a system where its elements are interconnected, such that a change in one part of the system makes changes in other parts of it, and that the system contains properties and behaviors which are different than those of its part (in other words, that the total is greater than the sum of its parts).

As a result of systems effects, in the international system (and any complex, interconnected system for that matter) we have:

Delayed and indirect outcomes

Emergent characteristics — Relationship between elements are based on relationships to other elements

Non-integrable function — you cannot understand the whole thru its parts

Unintended outcomes

Nonlinearities — unexpected breaks from the past given history

Feedback loops

Regulation being difficult

Does Jervis believe these system effects doom prediction? Not entirely, especially in his 1997 book. Although by his 2012 revisit, Jervis takes a slightly more negative tone, writing:

...My approach has an ambiguous stance towards prediction” and that “being realistic about the limits of our ability to know how we can reach desired ends can make us freer to act on our ideals. When it is not possible to see around the bend, to use Jones-Rooy and Page’s phrase, perhaps it is better not to try.

What is the response by the forecasting space to these claims? In the same journal edition as Jervis’ 2012 update, Philip Tetlock along with Horowitz and Hermann argue that:

System effects do not preclude pockets of predictability

Understanding system effects can helpful to expand those pockets

We should focus on questions that are in the “Goldilocks zone” of difficulty (between 10-90% ex-ante)

System Shock

Last month, the third point seemingly became irrelevant as the Forecasting Research Institute published some early results from their long-run forecasting tournament titled Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament.

Per the paper’s abstract, “the Existential Risk Persuasion Tournament (XPT) aimed to produce high-quality forecasts of the risks facing humanity over the next century by incentivizing thoughtful forecasts, explanations, persuasion, and updating from 169 forecasters over a multi-stage tournament.”

This new research is relevant for three main reasons:

The risk areas forecasted in the tournament (nuclear weapons, artificial intelligence, climate change, biorisks) are all topics of significant public interest, and any insights into these risks is noteworthy.

The tournament combined the mental might of both “superforecasters” and “experts,” giving us a unique look into how these two groups of forecasters compare with respect to accuracy.

This tournament is one of the first attempts at applying the short-range forecasting methodologies pioneered by Philip Tetlock to long-range forecasting questions.

Before reviewing the results from the report, there are two interesting choices made by the research team in the experimental setup that are worth mentioning.

First, the report states that 42% of the experts included in the tournament were members of the Effective Altruism community (defined as having attended an EA meetup).

Second, as long-range forecasts naturally will not resolve for decades, the tournament implemented something called intersubjective forecasts in order to measure long-range forecast accuracy, defined as “predictions of the views of other participants.”

Now to the results! If you couldn’t tell from the title of this section, we were not very impressed, or optimistic, about the early findings from the tournament.

The report states that its purpose is to document “variation in probabilistic beliefs and explanatory rationales on high-stakes issues,” but unfortunately it provides very little conclusive evidence of the accuracy of those beliefs.

Most of the findings from the tournament felt administrative, i.e. “facilitating productive adversarial collaborations” or how to “retain the talent of busy professionals in a demanding multi-month marathon.”

The report seems to put the cart before the horse, stating a goal in its Next Steps section to “make these forecasts more relevant to policymakers.” Conclusive evidence on accuracy is a prerequisite to providing policy recommendations based on long-term forecasts.

Initial thoughts

There is much to say and still think about when it comes to long-term forecasting and this report, which we plan to do over the next month as we get ready to give a talk on The Pitfalls and Promises of Long-Term Forecasting at the Manifest 2023 Conference.

The accuracy of human forecasting has long been a contested topic within social sciences such as international relations. That has begun to change as research from IARPA, Philip Tetlock, and others have demonstrated the viability of short-term forecasting, demonstrated in the books Expert Political Judgment and Superforecasting.
Today academics are continuing to push the boundaries of human forecasting research, and a new area of research has begun to gain traction: long-term forecasting. In this talk, Clay and Andrew from GeoVane will explore questions of whether or not humans truly have the tools, cognitive processes, and even capability to make accurate, repeatable, long-term predictions. Despite the clear benefits it would deliver if feasible, we predict that the answer will be no, and that the risks of wasted intellectual capital warrant serious discussion.

So with that being said, these are some of our initial thoughts which will animate our thinking moving forward. The main one being that these findings in conjunction with past readings have naturally led us to consider the merits of long-range forecasting writ large—finding ourselves increasingly aligned with the conclusions in Karl Popper’s Poverty of Historicism.

Ultimately, due to the early results of the tournament, it feels like calling long-range forecasts, “forecasts,” may itself be a misnomer. The long-range forecasts provided by the tournament participants read more like opinion polls, measures of belief, rather than prescriptive predictions about future outcomes.

Short-range forecasting already involves compounding conditional outcomes to generate a probability. This exercise comes with the risk of any individual conditional outcome changing, thereby affecting the final probability. Long-range forecasting increases the number of these conditional outcomes exponentially, to the point where it feels futile to even attempt to control those innumerable variables.

But this assumes that approaching long-range forecasting is even similar to approaching short-range forecasting. And whether or not that is correct is not clear either. In short-range forecasting, it is common to begin with a base rate – the frequency with which a similar event has taken place in the past. For long-range forecasting on topics like existential risks, oftentimes base rates do not exist. How do you approach a forecast where the event in question has never occurred before?

It seems that to get around this issue, the long-range forecasting tournament used intersubjective forecasts as a proxy for long-range forecasting accuracy. And while the research team behind the event views this methodology as adequate, we are not entirely convinced.

Now as we mentioned, the results in this report are early and as the report states, results will continue to filter in over the next few decades. If good research on this form of prognostication can only happen over decades, then even if these forecasts prove to be insightful, the logistics behind progressing the research become impractical.

Instead of trying to create prescriptive long-term forecasts, we should use belief measurements like those provided by this report as data points with which to create frameworks to forecast from. These long-range forecasts (maybe better termed “predictions”) can provide high-level context for short-range forecasts which are evidently more accurate and more importantly, more actionable.

If we want forecasts and forecasting as a field to be more greatly embraced by the policy community, we must focus on the most accurate, actionable areas of our field. And those are short-range forecasts. Tournaments like these, with a plethora of rules and intersubjective forecasting assignments, will lead to burnout for forecasters, and also create a selection bias from some of the minds most relevant to these existential risk discussions.