-x points for an incorrect answer with certainty x +2x points for the correct answer with certainty x
Alternately, +10^x points for a correct answer with certainty x, and +Log(1-x) points for the incorrect answer. This encourages an attempt to answer every question, even if the certainty is rated as 0.
If you give the student -X points for an incorrect answer with certainty X, and +2X points for a correct answer with certainty X, the expected value of giving an answer and lying about its certainty as Y is (1-X)(-Y) + (X)(2Y) = 3XY—Y. If X is less than 1⁄3, the student should lie and claim that his certainty is 0, while if X is greater than 1⁄3, he should lie and claim that his certainty is 1.
I’m not going to try to find the maximum for the second version, but it should be obvious that the student is still better off lying about his true certainty. Of course, you could just avoid telling the student how you’re going to grade, but the score will then just depend on how well the student guesses your grading criteria.
Neither of my described systems are ideal. Squared error works for binary questions, but it would reward “Pi is exactly 3, with 0 confidence”.
Rather than allow continuous estimates of accuracy, I think that the ideal system would ask the student to provide a range of confidence, (five choices from “guessing” to “Certain”, with equivalent probabilities), and an appropriate scoring rule; a guess would be penalized 0 for being wrong but gain little for being right, and going from “almost certain” to “certain” would add a small value to a correct answer but a large penalty to a wrong answer.
Having established the +points for correct and -points for wrong for each confidence description, do the math to determine what the actual ranges of confidence are, sanity check them against the descriptions, and then tell the student the confidence intervals. (Alternately, pick the intervals and terms and do the math to figure out the + for correct answer and -for incorrect answer for those intervals.)
and going from “almost certain” to “certain” would add a small value to a correct answer but a large penalty to a wrong answer.
It’s hard to come up with a system where the student doesn’t benefit from lying about his certainty. What you describe would fix the case from 4 (almost certain) to 5 (certain), but you need to get all the cases to work and it’s plausible that fixing the 4 to 5 case (and, in general, increasing the incentive to pick 4) breaks the 3 to 4 case.
After all, you can’t have all the transitions between certainty values add a small value to a correct answer. You must have a transition where a large value is added for a correct answer and your system may break down around such transitions.
That would mean a large value would be added when going from “guess” to “almost guess”, which would mean that it would be beneficial for a student to lie and claim to almost guess when he’s really completely guessing.
Suppose the student thinks that there is a 10% chance that he is right, and the reward structure is +5/-1 for confidence interval 1.
In fact, make the reward structure:(right/wrong) 1⁄0, 6/-1, 10/-3, 13/-6, 15/-10, 16/-15
That puts the breakpoints at roughly even intervals, keeps the math easy, and with a little bit of clarifying exactly where the breakpoints are, doesn’t reward someone who accurately determines their accuracy and then lies about it.
I sat down late last night trying to prove that this couldn’t work and instead proved that it could. If I did this correctly, in order for it to work, with the confidences increasing from 0 to 1,
left side confidence ⇐ (difference in Y)/(difference in X + difference in Y)
right side confidence >= (difference in Y)/(difference in X + difference in Y).
Differences in X are 5, 4, 3, 2, 1 and differences in Y are 1, 2, 3, 4, 5 leading to values of 1⁄6 through 5⁄6; as 0 < 1⁄6 < 1⁄5 < 2⁄6 < 2⁄5 < 3⁄6 < 3⁄5 < 4⁄6 < 4⁄5 < 5⁄6 < 1 this is immune to lying within a single interval (and also turns out to be so for multiple intervals).
So, what are the downsides of making this a grading standard? The biggest one I see is that it would be unfair except in classes that have as prerequisites an outstanding score in a class that covers credence calibration.
Allow both an answer and a certainty.
-x points for an incorrect answer with certainty x
+2x points for the correct answer with certainty x
Alternately, +10^x points for a correct answer with certainty x, and +Log(1-x) points for the incorrect answer. This encourages an attempt to answer every question, even if the certainty is rated as 0.
Yes, I know, old post.
If you give the student -X points for an incorrect answer with certainty X, and +2X points for a correct answer with certainty X, the expected value of giving an answer and lying about its certainty as Y is (1-X)(-Y) + (X)(2Y) = 3XY—Y. If X is less than 1⁄3, the student should lie and claim that his certainty is 0, while if X is greater than 1⁄3, he should lie and claim that his certainty is 1.
I’m not going to try to find the maximum for the second version, but it should be obvious that the student is still better off lying about his true certainty. Of course, you could just avoid telling the student how you’re going to grade, but the score will then just depend on how well the student guesses your grading criteria.
Neither of my described systems are ideal. Squared error works for binary questions, but it would reward “Pi is exactly 3, with 0 confidence”.
Rather than allow continuous estimates of accuracy, I think that the ideal system would ask the student to provide a range of confidence, (five choices from “guessing” to “Certain”, with equivalent probabilities), and an appropriate scoring rule; a guess would be penalized 0 for being wrong but gain little for being right, and going from “almost certain” to “certain” would add a small value to a correct answer but a large penalty to a wrong answer.
Having established the +points for correct and -points for wrong for each confidence description, do the math to determine what the actual ranges of confidence are, sanity check them against the descriptions, and then tell the student the confidence intervals. (Alternately, pick the intervals and terms and do the math to figure out the + for correct answer and -for incorrect answer for those intervals.)
It’s hard to come up with a system where the student doesn’t benefit from lying about his certainty. What you describe would fix the case from 4 (almost certain) to 5 (certain), but you need to get all the cases to work and it’s plausible that fixing the 4 to 5 case (and, in general, increasing the incentive to pick 4) breaks the 3 to 4 case.
After all, you can’t have all the transitions between certainty values add a small value to a correct answer. You must have a transition where a large value is added for a correct answer and your system may break down around such transitions.
The largest value would be added for the first confidence interval, which would also add the smallest cost to being wrong with that confidence.
That would mean a large value would be added when going from “guess” to “almost guess”, which would mean that it would be beneficial for a student to lie and claim to almost guess when he’s really completely guessing.
Suppose the student thinks that there is a 10% chance that he is right, and the reward structure is +5/-1 for confidence interval 1.
In fact, make the reward structure:(right/wrong) 1⁄0, 6/-1, 10/-3, 13/-6, 15/-10, 16/-15
That puts the breakpoints at roughly even intervals, keeps the math easy, and with a little bit of clarifying exactly where the breakpoints are, doesn’t reward someone who accurately determines their accuracy and then lies about it.
I sat down late last night trying to prove that this couldn’t work and instead proved that it could. If I did this correctly, in order for it to work, with the confidences increasing from 0 to 1,
left side confidence ⇐ (difference in Y)/(difference in X + difference in Y)
right side confidence >= (difference in Y)/(difference in X + difference in Y).
Differences in X are 5, 4, 3, 2, 1 and differences in Y are 1, 2, 3, 4, 5 leading to values of 1⁄6 through 5⁄6; as 0 < 1⁄6 < 1⁄5 < 2⁄6 < 2⁄5 < 3⁄6 < 3⁄5 < 4⁄6 < 4⁄5 < 5⁄6 < 1 this is immune to lying within a single interval (and also turns out to be so for multiple intervals).
So, what are the downsides of making this a grading standard? The biggest one I see is that it would be unfair except in classes that have as prerequisites an outstanding score in a class that covers credence calibration.