One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)
One should be able to think quantitatively about that, eg how many questions do you need to ask until you find out whether your extremization hurt you. I’m surprised by the suggestion that GJP didn’t do enough, unless their extremizations were frequently in the >90% range.
Each season, there were too few questions for this to be obvious, rather than a minor effect, and the “misses” were excused as getting an actually unlikely event wrong. It’s hard to say, post-hoc, that the ~1% consensus opinion about a “freak event” were accurate, but there was a huge surprise (and yes, this happened at least twice) or if the consensus was simply overconfident.
(I also think that the inability to specify estimates <0.5% or >99.5% reduced the extent to which the scores were hurt by these events.)