bill comments on Shut Up And Guess

bill 21 Jul 2009 14:48 UTC
32 points
I’ve given those kinds of tests in my decision analysis and my probabilistic analysis courses (for the multiple choice questions). Four choices, logarithmic scoring rule, 100% on the correct answer gives 1 point, 25% on the correct answer gives zero points, and 0% on the correct answer gives negative infinity.

Some students loved it. Some hated it. Many hated it until they realized that e.g. they didn’t need 90% of the points to get an A (I was generous on the points-to-grades part of grading).

I did have to be careful; minus infinity meant that on one question you could fail the class. I did have to be sure that it wasn’t a mistake, that they actually meant to put a zero on the correct answer.

If you want to try, you might want to try the Brier scoring rule instead of the logarithmic; it has a similar flavor without the minus infinity hassle.
- Eliezer Yudkowsky 21 Jul 2009 18:51 UTC
  16 points
  Parent
  
  minus infinity meant that on one question you could fail the class
  
  ...wow. Well, I guess that’s one way to teach people to avoid infinite certainty. Reminiscent of Jeffreyssai. Did that happen to a lot of students?
  - bill 22 Jul 2009 21:22 UTC
    17 points
    Parent
    Some students started putting zeros on the first assignment or two. However, all they needed was to see a few people get nailed putting 0.001 on the right answer (usually on the famous boy-girl probability problem) and people tended to start spreading their probability assignments. Some people never learn, though, so once in a while people would fail. I can only remember three in eight years.
    
    My professor ran a professional course like this. One year, one of the attendees put 100% on every question on every assignment, and got every single answer correct. The next year, someone attended from the same company, and decided he was going to do the same thing. Quite early, he got minus infinity. My professor’s response? “They both should be fired.”
    - Eliezer Yudkowsky 23 Jul 2009 5:05 UTC
      48 points
      Parent
      I cannot begin to say how vehemently I disagree with the idea of firing the first attendee. If I found out that your professor had fired them I would fire your professor.
      
      Sure, it has to be an expected utility fail if you take the problem literally, because of how little it would have cost to put only 99.99% on each correct answer, and how impossible it would be to be infinitely certain of getting every answer right. But this fails to take into account the out-of-context expected utility of being AWESOME.
      
      Firing the second guy is fine.
      - TraderJoe 27 Apr 2012 8:38 UTC
        0 points
        Parent
        [comment deleted]
      - TraderJoe 25 Apr 2012 13:31 UTC
        0 points
        Parent
        [comment deleted]
  - SoullessAutomaton 21 Jul 2009 22:45 UTC
    6 points
    Parent
    Given that this was stated as used in “decision analysis” and “probabilistic analysis” courses I would hope not...
    
    It’s rare that one has a chance to make the structure of an exam itself teach the material, independent of the content, heh.
- ArthurB 21 Jul 2009 19:30 UTC
  2 points
  Parent
  Good thing with a log score rule is that if the student try to maximize the expected score, they should write in their belief.
  
  For the same reason, when confronted with a set of odds on the outcome of an event, betting on each outcome in proportion to your belief will maximize the log of the expected gain (regardless of what the current odds are)
  - Benya 22 Jul 2009 14:25 UTC
    3 points
    Parent
    Unless I’m misunderstanding something, this is true for the Brier score, too: http://en.wikipedia.org/wiki/Scoring_rule#Proper_score_functions
    - ArthurB 22 Jul 2009 21:07 UTC
      0 points
      Parent
      You’re correct. In the previous post given, it was somehow assumed that the score for a wrong answer was 0. In that case, the only proper score function is the log.
      
      If you have a score function f1(q) for the right answer f0(q) for the wrong answer, and there are n possible choices, the right p are critical only if
      
      f0′ (x) = (k—x.f1′ (x))/(1-x)
      
      if we set f1(x) = 1 - (1-x)^p we can set f0(x) = -(1-x)^p + (1-x)^(p-1) * p/(p-1)
      
      for p = 2, we find f0(x) = -(1-x)^2 + 2(1-x) = 1 - x^2 this is Brier score for p = 3, we find f0(x) = -(1-x)^3 + (1-x)^2 ³⁄₂ = x^3 − 3x^2/2
      
      1-(1-x)^3 and x^3-3*x^2/2 shall be known as ArthurB’s score
      - Benya 23 Jul 2009 17:01 UTC
        0 points
        Parent
        I’m not following your calculations exactly, so please correct me if I’m misunderstanding, but it seems that you are assuming that the student chooses an option and a confidence for that option? My understanding was that the student chooses a probability distribution over all options and is scored on that. As for how to extend the Brier score to more than two options, I’m not sure whether there’s a standard way to do that, but one could always limit oneself to true/false questions… (in the log case you simply score log q_i, where q_i is the probability the student put on the correct answer, of course)
        ArthurB 23 Jul 2009 17:16 UTC
        1 point
        Parent
        No.
        
        I am assuming the student has a distribution in mind and we want to design a scoring rule where the best strategy to maximize the expected score is to write in the distribution you have in mind.
        
        If there are n options and the right answer is i and you give log(n p_i) / log(n) points to the student, then his incentive is to write in the exact distribution. On the other hand, if you give him say p_i* point, his incentive would be to write in “1” for the most likely answer and 0 otherwise.
        
        Another way to score is not to give point only on p_i but to take away points on p_i where i != i by using a function f1 for p_i* and f0 otherwise. I gave a necessary condition on f1 and f0 for the student belief to be a local maximum of the expected score. The technique is simply lagrangian multipliers.
        
        The number of options drop out of the equation that’s beautiful, so you can extend to any number of answers or even a continuous question. (when asked what the population of Zimbabwe is, the student could describe any parametric distribution and be scored on that… histograms, gaussians… there are many ways a students could write in his answer.
        Benya 23 Jul 2009 17:29 UTC
        0 points
        Parent
        Ok, so you’re saying the total score the student gets is f1(q_i*) + Sum_(i /= i*) f0(q_i)? I didn’t understand that from your original post, sorry.
        
        So does “(if) he score for a wrong answer was 0 (...) the only proper score function is the log” mean that if there are more than two options, log is the only proper score function that depends only on the probability assigned to the correct outcome, not on the way the rest of the probability mass is distributed among the other options? Or am I still misunderstanding?
        ArthurB 23 Jul 2009 18:23 UTC
        0 points
        Parent
        Yes, if there are two or more options and the score function depends only on the probability assigned to the correct outcome, then the only proper function is log. You can see that with the equation I gave
        
        f0′ (x) = (k—x.f1′ (x))/(1-x)
        
        for f0 = 0, it means x.f1′(x) = -k thus f1(x) = -k ln(x) + c (necessary condition)
        
        Then you have to check that -k ln(x) + c indeed works for some k and c, that is left as an exercise for the reader ^^
- faul_sname 19 Jan 2016 10:04 UTC
  0 points
  Parent
  What does 0.01% on the wrong answer get you?
  - gjm 19 Jan 2016 11:26 UTC
    0 points
    Parent
    Depends what you do with the other 99.99% and the other three answers, I assume.
    
    In a two-answer scenario, if I’m understanding bill’s version of the log scoring rule correctly, giving p=0.9999 to the right answer and p=0.0001 to the wrong answer should get you [log(0.9999)-log(1/2)]/log(2) ~= 0.99986 points. With four answers, giving p=0.9997 to the right answer and p=0.0001 to each of three wrong answers should get you [log(0.9997)-log(1/4)]/log(4) ~= 0.99978 points.