You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
I only read the AI Impacts article that includes that quote, not the data to which the quote alludes. Maybe ask the author?
You don’t need the data—it’s an argument from first principles. Basically, if you extremize guesses from 90% to 95%, and 90% is a correct estimate, 9⁄10 times you do better due to extremizing.
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
I did, he said a researcher mentioned it in conversation.