I vaguely recall reading an anecdote about a similar testing scheme where you had to give an actual numerical confidence value for each answer. Saying you were 100% confident of an answer that was wrong would give you minus infinity points.
I bet that would be even less popular with students.
I’ve given those kinds of tests in my decision analysis and my probabilistic analysis courses (for the multiple choice questions). Four choices, logarithmic scoring rule, 100% on the correct answer gives 1 point, 25% on the correct answer gives zero points, and 0% on the correct answer gives negative infinity.
Some students loved it. Some hated it. Many hated it until they realized that e.g. they didn’t need 90% of the points to get an A (I was generous on the points-to-grades part of grading).
I did have to be careful; minus infinity meant that on one question you could fail the class. I did have to be sure that it wasn’t a mistake, that they actually meant to put a zero on the correct answer.
If you want to try, you might want to try the Brier scoring rule instead of the logarithmic; it has a similar flavor without the minus infinity hassle.
Some students started putting zeros on the first assignment or two. However, all they needed was to see a few people get nailed putting 0.001 on the right answer (usually on the famous boy-girl probability problem) and people tended to start spreading their probability assignments. Some people never learn, though, so once in a while people would fail. I can only remember three in eight years.
My professor ran a professional course like this. One year, one of the attendees put 100% on every question on every assignment, and got every single answer correct. The next year, someone attended from the same company, and decided he was going to do the same thing. Quite early, he got minus infinity. My professor’s response? “They both should be fired.”
I cannot begin to say how vehemently I disagree with the idea of firing the first attendee. If I found out that your professor had fired them I would fire your professor.
Sure, it has to be an expected utility fail if you take the problem literally, because of how little it would have cost to put only 99.99% on each correct answer, and how impossible it would be to be infinitely certain of getting every answer right. But this fails to take into account the out-of-context expected utility of being AWESOME.
Good thing with a log score rule is that if the student try to maximize the expected score, they should write in their belief.
For the same reason, when confronted with a set of odds on the outcome of an event, betting on each outcome in proportion to your belief will maximize the log of the expected gain (regardless of what the current odds are)
You’re correct. In the previous post given, it was somehow assumed that the score for a wrong answer was 0. In that case, the only proper score function is the log.
If you have a score function f1(q) for the right answer f0(q) for the wrong answer, and there are n possible choices, the right p are critical only if
f0′ (x) = (k—x.f1′ (x))/(1-x)
if we set f1(x) = 1 - (1-x)^p
we can set f0(x) = -(1-x)^p + (1-x)^(p-1) * p/(p-1)
for p = 2, we find f0(x) = -(1-x)^2 + 2(1-x) = 1 - x^2 this is Brier score
for p = 3, we find f0(x) = -(1-x)^3 + (1-x)^2 3⁄2 = x^3 − 3x^2/2
1-(1-x)^3 and x^3-3*x^2/2 shall be known as ArthurB’s score
I’m not following your calculations exactly, so please correct me if I’m misunderstanding, but it seems that you are assuming that the student chooses an option and a confidence for that option? My understanding was that the student chooses a probability distribution over all options and is scored on that. As for how to extend the Brier score to more than two options, I’m not sure whether there’s a standard way to do that, but one could always limit oneself to true/false questions… (in the log case you simply score log q_i, where q_i is the probability the student put on the correct answer, of course)
I am assuming the student has a distribution in mind and we want to design a scoring rule where the best strategy to maximize the expected score is to write in the distribution you have in mind.
If there are n options and the right answer is i and you give log(n p_i) / log(n) points to the student, then his incentive is to write in the exact distribution. On the other hand, if you give him say p_i* point, his incentive would be to write in “1” for the most likely answer and 0 otherwise.
Another way to score is not to give point only on p_i but to take away points on p_i where i != i by using a function f1 for p_i* and f0 otherwise. I gave a necessary condition on f1 and f0 for the student belief to be a local maximum of the expected score. The technique is simply lagrangian multipliers.
The number of options drop out of the equation that’s beautiful, so you can extend to any number of answers or even a continuous question. (when asked what the population of Zimbabwe is, the student could describe any parametric distribution and be scored on that… histograms, gaussians… there are many ways a students could write in his answer.
Ok, so you’re saying the total score the student gets is f1(q_i*) + Sum_(i /= i*) f0(q_i)? I didn’t understand that from your original post, sorry.
So does “(if) he score for a wrong answer was 0 (...) the only proper score function is the log” mean that if there are more than two options, log is the only proper score function that depends only on the probability assigned to the correct outcome, not on the way the rest of the probability mass is distributed among the other options? Or am I still misunderstanding?
Yes, if there are two or more options and the score function depends only on the probability assigned to the correct outcome, then the only proper function is log. You can see that with the equation I gave
f0′ (x) = (k—x.f1′ (x))/(1-x)
for f0 = 0, it means
x.f1′(x) = -k
thus f1(x) = -k ln(x) + c (necessary condition)
Then you have to check that -k ln(x) + c indeed works for some k and c, that is left as an exercise for the reader ^^
Depends what you do with the other 99.99% and the other three answers, I assume.
In a two-answer scenario, if I’m understanding bill’s version of the log scoring rule correctly, giving p=0.9999 to the right answer and p=0.0001 to the wrong answer should get you [log(0.9999)-log(1/2)]/log(2) ~= 0.99986 points. With four answers, giving p=0.9997 to the right answer and p=0.0001 to each of three wrong answers should get you [log(0.9997)-log(1/4)]/log(4) ~= 0.99978 points.
I vaguely recall reading an anecdote about a similar testing scheme where you had to give an actual numerical confidence value for each answer. Saying you were 100% confident of an answer that was wrong would give you minus infinity points.
I bet that would be even less popular with students.
I’ve given those kinds of tests in my decision analysis and my probabilistic analysis courses (for the multiple choice questions). Four choices, logarithmic scoring rule, 100% on the correct answer gives 1 point, 25% on the correct answer gives zero points, and 0% on the correct answer gives negative infinity.
Some students loved it. Some hated it. Many hated it until they realized that e.g. they didn’t need 90% of the points to get an A (I was generous on the points-to-grades part of grading).
I did have to be careful; minus infinity meant that on one question you could fail the class. I did have to be sure that it wasn’t a mistake, that they actually meant to put a zero on the correct answer.
If you want to try, you might want to try the Brier scoring rule instead of the logarithmic; it has a similar flavor without the minus infinity hassle.
...wow. Well, I guess that’s one way to teach people to avoid infinite certainty. Reminiscent of Jeffreyssai. Did that happen to a lot of students?
Some students started putting zeros on the first assignment or two. However, all they needed was to see a few people get nailed putting 0.001 on the right answer (usually on the famous boy-girl probability problem) and people tended to start spreading their probability assignments. Some people never learn, though, so once in a while people would fail. I can only remember three in eight years.
My professor ran a professional course like this. One year, one of the attendees put 100% on every question on every assignment, and got every single answer correct. The next year, someone attended from the same company, and decided he was going to do the same thing. Quite early, he got minus infinity. My professor’s response? “They both should be fired.”
I cannot begin to say how vehemently I disagree with the idea of firing the first attendee. If I found out that your professor had fired them I would fire your professor.
Sure, it has to be an expected utility fail if you take the problem literally, because of how little it would have cost to put only 99.99% on each correct answer, and how impossible it would be to be infinitely certain of getting every answer right. But this fails to take into account the out-of-context expected utility of being AWESOME.
Firing the second guy is fine.
[comment deleted]
[comment deleted]
Given that this was stated as used in “decision analysis” and “probabilistic analysis” courses I would hope not...
It’s rare that one has a chance to make the structure of an exam itself teach the material, independent of the content, heh.
Good thing with a log score rule is that if the student try to maximize the expected score, they should write in their belief.
For the same reason, when confronted with a set of odds on the outcome of an event, betting on each outcome in proportion to your belief will maximize the log of the expected gain (regardless of what the current odds are)
Unless I’m misunderstanding something, this is true for the Brier score, too: http://en.wikipedia.org/wiki/Scoring_rule#Proper_score_functions
You’re correct. In the previous post given, it was somehow assumed that the score for a wrong answer was 0. In that case, the only proper score function is the log.
If you have a score function f1(q) for the right answer f0(q) for the wrong answer, and there are n possible choices, the right p are critical only if
f0′ (x) = (k—x.f1′ (x))/(1-x)
if we set f1(x) = 1 - (1-x)^p we can set f0(x) = -(1-x)^p + (1-x)^(p-1) * p/(p-1)
for p = 2, we find f0(x) = -(1-x)^2 + 2(1-x) = 1 - x^2 this is Brier score for p = 3, we find f0(x) = -(1-x)^3 + (1-x)^2 3⁄2 = x^3 − 3x^2/2
1-(1-x)^3 and x^3-3*x^2/2 shall be known as ArthurB’s score
I’m not following your calculations exactly, so please correct me if I’m misunderstanding, but it seems that you are assuming that the student chooses an option and a confidence for that option? My understanding was that the student chooses a probability distribution over all options and is scored on that. As for how to extend the Brier score to more than two options, I’m not sure whether there’s a standard way to do that, but one could always limit oneself to true/false questions… (in the log case you simply score log q_i, where q_i is the probability the student put on the correct answer, of course)
No.
I am assuming the student has a distribution in mind and we want to design a scoring rule where the best strategy to maximize the expected score is to write in the distribution you have in mind.
If there are n options and the right answer is i and you give log(n p_i) / log(n) points to the student, then his incentive is to write in the exact distribution. On the other hand, if you give him say p_i* point, his incentive would be to write in “1” for the most likely answer and 0 otherwise.
Another way to score is not to give point only on p_i but to take away points on p_i where i != i by using a function f1 for p_i* and f0 otherwise. I gave a necessary condition on f1 and f0 for the student belief to be a local maximum of the expected score. The technique is simply lagrangian multipliers.
The number of options drop out of the equation that’s beautiful, so you can extend to any number of answers or even a continuous question. (when asked what the population of Zimbabwe is, the student could describe any parametric distribution and be scored on that… histograms, gaussians… there are many ways a students could write in his answer.
Ok, so you’re saying the total score the student gets is
f1(q_i*) + Sum_(i /= i*) f0(q_i)
? I didn’t understand that from your original post, sorry.So does “(if) he score for a wrong answer was 0 (...) the only proper score function is the log” mean that if there are more than two options, log is the only proper score function that depends only on the probability assigned to the correct outcome, not on the way the rest of the probability mass is distributed among the other options? Or am I still misunderstanding?
Yes, if there are two or more options and the score function depends only on the probability assigned to the correct outcome, then the only proper function is log. You can see that with the equation I gave
f0′ (x) = (k—x.f1′ (x))/(1-x)
for f0 = 0, it means x.f1′(x) = -k thus f1(x) = -k ln(x) + c (necessary condition)
Then you have to check that -k ln(x) + c indeed works for some k and c, that is left as an exercise for the reader ^^
What does 0.01% on the wrong answer get you?
Depends what you do with the other 99.99% and the other three answers, I assume.
In a two-answer scenario, if I’m understanding bill’s version of the log scoring rule correctly, giving p=0.9999 to the right answer and p=0.0001 to the wrong answer should get you [log(0.9999)-log(1/2)]/log(2) ~= 0.99986 points. With four answers, giving p=0.9997 to the right answer and p=0.0001 to each of three wrong answers should get you [log(0.9997)-log(1/4)]/log(4) ~= 0.99978 points.
Meet “The Aumann Game”. See also: scoring rules.