I’d agree, but to be precise, I think this is not exactly the right measure. What matters is less that the majority are overconfident, but generally rather that their score according to a reasonable scoring function is worse than what they would expect, on average (or weighted according to some factor).
Else, for instance, it’s possible that 51% would technically be slightly overconfident, but the rest would be decently underconfident, averaging to proper calibration.
I plan on writing about this more in future posts and similar.
I agree, what matters is calibration and resolution.
If you’re talking about an individual s prediction that is, I’m unconvinced that group calibration would be a useful epistemic yardstick in this instance.
Note also that its’ impossible to determine “a majority of predictions to be oveconfident” as a literal statement. A prediction is only right or wrong, overconfidence can only be looked at in terms of the aggregate (which is what I meant in the original post).
I’d agree, but to be precise, I think this is not exactly the right measure. What matters is less that the majority are overconfident, but generally rather that their score according to a reasonable scoring function is worse than what they would expect, on average (or weighted according to some factor).
Else, for instance, it’s possible that 51% would technically be slightly overconfident, but the rest would be decently underconfident, averaging to proper calibration.
I plan on writing about this more in future posts and similar.
I agree, what matters is calibration and resolution.
If you’re talking about an individual s prediction that is, I’m unconvinced that group calibration would be a useful epistemic yardstick in this instance.
Note also that its’ impossible to determine “a majority of predictions to be oveconfident” as a literal statement. A prediction is only right or wrong, overconfidence can only be looked at in terms of the aggregate (which is what I meant in the original post).