Running Effective Structured Forecasting Sessions

(pt.4 of our Forecasting Infrastructure series)

Tl;dr: We’re sharing our structured forecasting session format.

Intro

In our quest to build forecasting infrastructure, we’ve reencountered a classic open problem in group rationality—how do you run a meeting? More specifically, how can you run a meeting in order to elicit, aggregate, and distill individual human judgements in a principled manner such that the whole is greater than the sum of its parts. We wanted to find a meeting format that would help a community of forecasters work together to collectively make better predictions than they would on their own.

It’s a question that’s been studied since at least the 1950’s when the RAND Corporation invented the Delphi Method. The Delphi Method is a structured meeting template wherein a group of experts estimate their answer to a question, share their reasoning, reply to others estimations, and then estimate again. It encourages updating in the face of new evidence, and the format has a strong track record of producing well-calibrated predictions.

However, for our needs, it wasn’t quite the right fit. In particular we didn’t have domain experts, we had expert forecasters. The quintessential superforecaster is good at forecasting but might not have refined models of the domain (in this case AI). We wanted to create a format that would help forecasters rapidly orient to the question. To that end we made a few changes:

- We added a much greater emphasis on understanding and building models, collaboratively, on the question.

- We included time for active research into the topic.

For the past eight months we’ve been holding forecasting sessions most Sundays. I think our approach is refined enough and general enough that the template can be used by others to good effect.

Format

The general approach is to take 4-12 forecasters through a series of key steps:

  • Understand the question and your key “technical” uncertainties (ex. what does this word mean)

  • Individually make a forecast

  • Discuss as a group the initial forecasts

  • Collaboratively analyze the question from several lenses.

    • Outside view

    • Inside View

    • Key Uncertainties

    • Scenario Planning

    • What would change my mind?

  • Discuss as a group

  • Make new individual forecasts

  • Share and compare updates

A facilitator leads the session, largely performing a logistical role such as keeping track of time and recording key comments in the document.

Most of the session is spent collaboratively but silently writing in a Google doc. This enables “multiplexing” operations where multiple people work together on different points. Contrast with a standard online meetings where everyone’s attention is directed to one person, potentially wasting a lot of brainpower. We also heavily use meta tags. Using brackets to indicate an [info-request] or support for a point [+1] helped direct attention and signal group consensus.

Key “goals” of the format:

  • Create an environment in which there is an interactive back-and-forth on the question.

  • Approach the question from different angles. For example create an outside view model of the question and then an inside view model of the question—the different approaches complement one another and can reveal blindspots.

  • Generate sub-questions that can be answered or forecast.

  • Encourage flexibility and switching between breadth first and depth first approaches for investigating a topic.

For a taste here’s a transcript from our AI forecasting session on a set of questions assessing the likelihood that, by a certain year, there will be a 2-year interval in which the AI-compute trend did not double. You can see from the transcript how the format can elicit models and beliefs, and allow back and forth between participants in a manner to identify the key uncertainties driving the forecast.

Thoughts

I feel confident saying the session format helps in clarifying and understanding the forecasting question—every time I’ve participated I’ve come away with a much deeper understanding of the contours of the question. Similar to the surprising effectiveness of fermi estimation, it’s surprisingly effective to spend 10-30 minutes collaboratively poking at a forecasting question.

On the other hand I’m not entirely confident that we’re not increasing group think risks. The original Delphi method had every participant be anonymous—we haven’t tried that but I’d like to.

Even though much of the session happens in a Google Doc, I’d recommend using a video chat platform. For the past few months our sessions have been conducted over Discord, so that we could easily have multiple “voice channels” if session participants want to break off and discuss one on one. That feature has been nice, but I fear we’ve lost an ineffable quality from voice chat. There’s a way in which seeing everyones face, even if you’re working silently, helps sharpen and focus attention on the question at hand—you know you’re in this together.

Finally on a meta level I’m surprised at how useful purposefully designing a custom meeting format was. I bet there’s a lot of easy gains to be made through more custom tailoring of conversations (more experimentation like Robin Hanson’s EquaTalk idea would be neat).

Notes

  • We’ve also experimented with “operationalization” workshops, where the main goal is to take an unformed intent and turn it into a well operationalized question, and “speed-forecasting” workshops, where we take several questions and rapidly share models and forecast.

  • The IDEA method was also an inspiration for our forecasting sessions.