Suppose the students are subsequently told, by someone whom they trust but who happens to be wrong, that the answer isn’t A.
The 50:50:0:0 student says “okay, then it must be B”. The 50:25:25:0 student says “okay, then it must be B or C, 50% on each”. And the 50:17:17:17 student says “okay, then I don’t know”.
I don’t think these responses are equally good, and I don’t think they should be rewarded equally. The second student is more confused by fiction than the first, and the third is more confused again.
That said, to give a concrete example: what is 70*80? Is it 5600, 5400, 56000, or 3? By the way, it’s not 5600.
Obviously the best response here is “um, yes it is”. But I still feel like someone who gives equal weight to 3 as to 5400 is… either very confident in their skills, or very confused. I think my intuition is that I want to reward that student less than the other two, which goes against both your answer (reward them all equally) and my answer above (reward that student the most).
But I can’t really imagine someone honestly giving 50:17:17:17 to that question. Someone who gave equal scores to the last three answers probably gave something like either 100:0:0:0 (if they’re confident) or 25:25:25:25 (if they’re confused), and gets a higher or lower reward from that. So I dunno what to make of this.
I think to do this instead of preferring certain ratios between answers, we should prefer certain answers.
Under the original scoring scheme 50:50:0:0 doesn’t score differently from 50:0:50:0 or 50:0:0:50. The average credence for each answer between those 3 is 50:17:17:17 so I’d argue that (without some external marking of which incorrect answers are more reasonable) 50:50:0:0 should score the same as 50:17:17:17.
However we could choose a marking scheme where you get back (using my framing of log scoring above):
100% of the points put on A
10% of the points put on B
10% of the points put on C
0% of the points put on D
That way 50:50:0:0 and 50:25:25:0 both end up with 55% of their points but 50:17:17:17 gets 53.4% and 50:0:0:50 gets 50%. Play around with the percentages to get rewards that seem reasonable—I think it would still be a proper scoring rule*. You could do something similar with a quadratic scoring rule.
*I think one danger is that if I am unsure but I think I can guess what the teacher thinks is reasonable/unreasonable then this might tempt me to alter my score based on something other than my actual credence levels.
I don’t think this is a knock-down argument, but:
Suppose the students are subsequently told, by someone whom they trust but who happens to be wrong, that the answer isn’t A.
The 50:50:0:0 student says “okay, then it must be B”. The 50:25:25:0 student says “okay, then it must be B or C, 50% on each”. And the 50:17:17:17 student says “okay, then I don’t know”.
I don’t think these responses are equally good, and I don’t think they should be rewarded equally. The second student is more confused by fiction than the first, and the third is more confused again.
That said, to give a concrete example: what is 70*80? Is it 5600, 5400, 56000, or 3? By the way, it’s not 5600.
Obviously the best response here is “um, yes it is”. But I still feel like someone who gives equal weight to 3 as to 5400 is… either very confident in their skills, or very confused. I think my intuition is that I want to reward that student less than the other two, which goes against both your answer (reward them all equally) and my answer above (reward that student the most).
But I can’t really imagine someone honestly giving 50:17:17:17 to that question. Someone who gave equal scores to the last three answers probably gave something like either 100:0:0:0 (if they’re confident) or 25:25:25:25 (if they’re confused), and gets a higher or lower reward from that. So I dunno what to make of this.
This makes sense to me.
I think to do this instead of preferring certain ratios between answers, we should prefer certain answers.
Under the original scoring scheme 50:50:0:0 doesn’t score differently from 50:0:50:0 or 50:0:0:50. The average credence for each answer between those 3 is 50:17:17:17 so I’d argue that (without some external marking of which incorrect answers are more reasonable) 50:50:0:0 should score the same as 50:17:17:17.
However we could choose a marking scheme where you get back (using my framing of log scoring above):
100% of the points put on A
10% of the points put on B
10% of the points put on C
0% of the points put on D
That way 50:50:0:0 and 50:25:25:0 both end up with 55% of their points but 50:17:17:17 gets 53.4% and 50:0:0:50 gets 50%. Play around with the percentages to get rewards that seem reasonable—I think it would still be a proper scoring rule*. You could do something similar with a quadratic scoring rule.
*I think one danger is that if I am unsure but I think I can guess what the teacher thinks is reasonable/unreasonable then this might tempt me to alter my score based on something other than my actual credence levels.