Yeah, that’s not a particularly strong scoring method, due to its abusability. I wonder what a better one would be? Of course, it wouldn’t help unless people knew that it was going to be used, and care.
Fraction correct times this calibration score? Number correct times the product rather than the average of what you did there? Bayes score, with naming the ‘wrong’ thing yielding a penalty to account for the multiplicity of wrong answers (say, each wrong answer has a 50% hit so even being 100% sure you’re wrong is only as good as 50% sure you’re right, when you are right)?
The primary property you want to maintain with a scoring rule is that the best probability to provide is your true probability. I know that the Bayes score generalizes to multiple choice questions, which implies to me that it most likely works with a multiplicity for wrong answers, so long as the multiplicity is close to the actual multiplicity.
I think the primary property you want to maintain is that it’s best to provide the answer you consider most likely, otherwise it’s best to say ‘sdfkhasflk’ − 0% to all of them you aren’t certain of.
Multiple choice would making the scoring clearer, but that constraint could well make the calibration easier.
Yeah, that’s not a particularly strong scoring method, due to its abusability. I wonder what a better one would be? Of course, it wouldn’t help unless people knew that it was going to be used, and care.
Fraction correct times this calibration score? Number correct times the product rather than the average of what you did there? Bayes score, with naming the ‘wrong’ thing yielding a penalty to account for the multiplicity of wrong answers (say, each wrong answer has a 50% hit so even being 100% sure you’re wrong is only as good as 50% sure you’re right, when you are right)?
The primary property you want to maintain with a scoring rule is that the best probability to provide is your true probability. I know that the Bayes score generalizes to multiple choice questions, which implies to me that it most likely works with a multiplicity for wrong answers, so long as the multiplicity is close to the actual multiplicity.
I think the primary property you want to maintain is that it’s best to provide the answer you consider most likely, otherwise it’s best to say ‘sdfkhasflk’ − 0% to all of them you aren’t certain of.
Multiple choice would making the scoring clearer, but that constraint could well make the calibration easier.