Improvement for pundit prediction comparisons

[EDIT: SimonM pointed out a possibly-fatal flaw with this plan: it would probably discourage more pundits from joining the prediction-making club at all, and adding to that club is a higher priority than comparing the members more accurately.]

Stop me if you’ve heard this one. (Seriously, I may not be the first to have written this kind of idea here. Let me know if not.)

We’ve got several pundits making yearly predictions now, which is fantastic progress for the field. However, if they’re not answering the same questions, you can’t effectively judge their performance against one another.

I suggest that this winter, we do 2 rounds, one for proposing questions and one for making predictions.

December 1: deadline for pundits to propose prediction questions.

December: Metaculus formalizes questions (where possible) and opens markets.

January 1: deadline for pundits to register their predictions (they don’t have to bet) on any markets they choose.

At the end of the next year, we can judge pundits against each other on the intersection of their answered questions. (We can also check whether the pundit beat the Metaculus prices at the time they entered their predictions.)

This won’t guarantee a total or even partial ordering on pundits, if they choose to answer different sets of questions; but the victor of any pair will be clear (after choosing a scoring rule). We can treat the result as a round-robin tournament among the pundits, or better yet, do data analysis on subdomains (who beat whom in predicting US politics, etc) where clearer winners may emerge.

Additional possible features:

We could make a secret place for pundits to register their predictions, to be revealed on New Year’s, so that others can’t piggybank off of them. The pundits can of course piggyback off of the Metaculus price.
We can let pundits enter some predictions as official and others as unofficial. They’ll only be judged by their official ones; their unofficial ones are for practicing the prediction skill, or cases where their intuition feels uncalibrated.

Thanks to ciphergoth for developing this idea with me!