Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that “more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke.”
That’s a great point. I’m uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .
Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It’s almost the equivalent of betting a dollar more than the current high bid in price is right—you don’t need to be close, you just need to beat the other people’s scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.
As I replied to Pablo below, ”...it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing. ”
You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
Thanks for this.
Re extremizing, the recent (excellent) AI Impacts overview of good forecasting practices notes that “more recent data suggests that the successes of the extremizing algorithm during the forecasting tournament were a fluke.”
That’s a great point. I’m uncertain if the analyses account for the cited issue, where we would expect a priori that extremizing slightly would on average hurt the accuracy, but in any moderately sized sample (like the forecasting tournament,) it is likely to help. It also relates to a point I made about why proper scoring rules are not incentive compatible in tournaments in a tweetstorm here; https://twitter.com/davidmanheim/status/1080460223284948994 .
Interestingly, a similar dynamic may happen in tournaments, and could be part of where info-cascades occur. I can in expectation outscore everyone else slightly and minimize my risk of doing very poorly by putting my predictions a bit to the extreme of the current predictions. It’s almost the equivalent of betting a dollar more than the current high bid in price is right—you don’t need to be close, you just need to beat the other people’s scores to win. But if I report my best strategy answer instead of my true guess, it seems that it could cascade if others are unaware I am doing this.
[meta] Not sure why the link to the overview isn’t working. Here’s how the comment looks before I submit it:
https://imgur.com/MF5Z2X4
(The same problem is affecting this comment.)
In any case, the URL is:
https://aiimpacts.org/evidence-on-good-forecasting-practices-from-the-good-judgment-project-an-accompanying-blog-post/
It’s because I am a bad developer and I broke some formatting stuff (again). Will be fixed within the hour.
Edit: Fixed now
Thanks, Oli!
Do you have a link to this data?
As I replied to Pablo below, ”...it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing. ”
I only read the AI Impacts article that includes that quote, not the data to which the quote alludes. Maybe ask the author?
You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
I did, he said a researcher mentioned it in conversation.