IIRC you wrote a previous post about Goodhart which included the idea that keeping evaluation methods secret can help avert Goodhart. The same idea seems relevant here: if the exact resolution method is not known to forecasters, it’s more worthwhile for them to put effort into better forecasts on the intended topic, rather than forecasting the unintended corner cases.
For cases like the COVID numbers, partial resolution seems like it would be useful: you can’t get the exact numbers, but you can get relatively firm lower and upper bounds. A prediction market could partially pay out bets; in forecasting more generally it should be possible to partially score. The numerical estimates might be improved over time, so EG you get an initial partial payout when the first statistics become available, and refined estimates gradually close the confidence intervals.
For offloading resolution / meta-resolution, one helpful mechanism might be to pay experts based on agreement with the majority of experts. This could be used in cases where there is low trust but low risk of collusion, so that the Schelling point for those involved is to give the common-sense judgement.
Partial resolution could also help with getting some partial signal on long term forecasts.
In particular, if we know that a forecasting target is growing monotonously over time (like “date at which X happens” or “cumulative number of X before a specified date”), we can split P(outcome=T) into P(outcome>lower bound)*P(outcome=T|outcome>lower bound). If we use log scoring, we then get log(P(outcome>lower bound)) as an upper bound on the score.
If forecasts came in the form of more detailed models, it should be possible to use a similar approach to calculate bounds based on conditioning on more complicated events as well.
“partial resolution seems like it would be useful” I hadn’t thought of this originally, but Nuno added the category of “Resolve with a Probability,” which does this. The idea of iterated closing of a question as the bounds improve is neat, but probably technically challenging. (GJ Inc. kind-of does this when they close answer options that are already certain to be wrong, such as total ranges below the current number of CVOID cases.) I’d also worry it creates complexity that makes it much less clear to forecasters how things will work.
”one helpful mechanism might be to pay experts based on agreement with the majority of experts” Yes—this has been proposed under the same set of ideas as “meta-forecasts have also been proposed as a way to resolve very long term questions,” though I guess it has clearer implications for otherwise ambiguous short term questions. I should probably include it. The key problem in my mind, which isn’t necessarily fatal, is that it makes incentive compatibility into a fairly complex game-theoretic issue, with collusion and similar issues being possible.
”keeping evaluation methods secret can help avert Goodhart” Yes, I’ve definitely speculated along those lines. But for the post, I was worried that once I started talking about this as a Goodhart-issue, I would need to explain far more, and be very side-tracked, and it’s something I will address more in the next post in any case.
“partial resolution seems like it would be useful” I hadn’t thought of this originally, but Nuno added the category of “Resolve with a Probability,” which does this. The idea of iterated closing of a question as the bounds improve is neat, but probably technically challenging. (GJ Inc. kind-of does this when they close answer options that are already certain to be wrong, such as total ranges below the current number of CVOID cases.) I’d also worry it creates complexity that makes it much less clear to forecasters how things will work.
Here’s how I imagine it working.
Suppose a prediction market includes a numerically-valued proposition, like if we forecast COVID numbers not by putting probabilities on different ranges, but rather, by letting people buy and sell contracts which pay out proportional to COVID numbers. The market price of such a contract becomes our projection. (Or, you know, some equivalent mechanism for non-cash markets.)
Then, when we get partial information about COVID numbers, we create a partial payout: if we’re confident covid numbers for a given period were at least 1K, we can cause sellers of the contract to pay 1K’s worth to buyers. As the lower bound gets better, they pay more.
Of course, the mathematical work deciding when we can be “confident” of a given lower bound can be challenging, and the forecasters have to guess how this will be handled.
And a big problem with this method is that it will low-ball the number in question, since the confidence interval will never close up to a single number, and forecasters only have to worry about the lower end of the confidence interval.
I think we agree on this—iterated closing is an interesting idea, but I’m not sure it solves a problem. It doesn’t help with ambiguity, since we can’t find bounds. And earlier payouts are nice, but by the time we can do partial payouts, they are either tiny, because of large ranges, or they are not much before closing. (They also create nasty problems with incentive compatibility, which I’m unsure can be worked out cleanly.)
Some ideas:
IIRC you wrote a previous post about Goodhart which included the idea that keeping evaluation methods secret can help avert Goodhart. The same idea seems relevant here: if the exact resolution method is not known to forecasters, it’s more worthwhile for them to put effort into better forecasts on the intended topic, rather than forecasting the unintended corner cases.
For cases like the COVID numbers, partial resolution seems like it would be useful: you can’t get the exact numbers, but you can get relatively firm lower and upper bounds. A prediction market could partially pay out bets; in forecasting more generally it should be possible to partially score. The numerical estimates might be improved over time, so EG you get an initial partial payout when the first statistics become available, and refined estimates gradually close the confidence intervals.
For offloading resolution / meta-resolution, one helpful mechanism might be to pay experts based on agreement with the majority of experts. This could be used in cases where there is low trust but low risk of collusion, so that the Schelling point for those involved is to give the common-sense judgement.
Partial resolution could also help with getting some partial signal on long term forecasts.
In particular, if we know that a forecasting target is growing monotonously over time (like “date at which X happens” or “cumulative number of X before a specified date”), we can split P(outcome=T) into P(outcome>lower bound)*P(outcome=T|outcome>lower bound). If we use log scoring, we then get log(P(outcome>lower bound)) as an upper bound on the score.
If forecasts came in the form of more detailed models, it should be possible to use a similar approach to calculate bounds based on conditioning on more complicated events as well.
“partial resolution seems like it would be useful”
I hadn’t thought of this originally, but Nuno added the category of “Resolve with a Probability,” which does this. The idea of iterated closing of a question as the bounds improve is neat, but probably technically challenging. (GJ Inc. kind-of does this when they close answer options that are already certain to be wrong, such as total ranges below the current number of CVOID cases.) I’d also worry it creates complexity that makes it much less clear to forecasters how things will work.
”one helpful mechanism might be to pay experts based on agreement with the majority of experts”
Yes—this has been proposed under the same set of ideas as “meta-forecasts have also been proposed as a way to resolve very long term questions,” though I guess it has clearer implications for otherwise ambiguous short term questions. I should probably include it. The key problem in my mind, which isn’t necessarily fatal, is that it makes incentive compatibility into a fairly complex game-theoretic issue, with collusion and similar issues being possible.
”keeping evaluation methods secret can help avert Goodhart”
Yes, I’ve definitely speculated along those lines. But for the post, I was worried that once I started talking about this as a Goodhart-issue, I would need to explain far more, and be very side-tracked, and it’s something I will address more in the next post in any case.
Here’s how I imagine it working.
Suppose a prediction market includes a numerically-valued proposition, like if we forecast COVID numbers not by putting probabilities on different ranges, but rather, by letting people buy and sell contracts which pay out proportional to COVID numbers. The market price of such a contract becomes our projection. (Or, you know, some equivalent mechanism for non-cash markets.)
Then, when we get partial information about COVID numbers, we create a partial payout: if we’re confident covid numbers for a given period were at least 1K, we can cause sellers of the contract to pay 1K’s worth to buyers. As the lower bound gets better, they pay more.
Of course, the mathematical work deciding when we can be “confident” of a given lower bound can be challenging, and the forecasters have to guess how this will be handled.
And a big problem with this method is that it will low-ball the number in question, since the confidence interval will never close up to a single number, and forecasters only have to worry about the lower end of the confidence interval.
I think we agree on this—iterated closing is an interesting idea, but I’m not sure it solves a problem. It doesn’t help with ambiguity, since we can’t find bounds. And earlier payouts are nice, but by the time we can do partial payouts, they are either tiny, because of large ranges, or they are not much before closing. (They also create nasty problems with incentive compatibility, which I’m unsure can be worked out cleanly.)