Scott Garrabrant comments on A Proper Scoring Rule for Confidence Intervals

Scott Garrabrant 13 Feb 2018 1:57 UTC
21 points
EDIT: I originally said you can do this for multiple choice questions, which is wrong. It only works for questions with two answers.
(In a comment, to keep top level post short.)
One cute way to do calibration for probabilities, is to construst a spinner. If you have a true/false question, you can construct a spinner which is divided up according to your probability that each answer is the correct answer.
If you were to then spin the spinner once, and win if it comes up on the correct answer, this would not incentize constructing the spinner to represent your true beliefs. The best strategy is to put all the mass on the most likely answer.
However, if you spin the spinner twice, and win if either spin lands on the correct answer, you are actually incentivized to make the spinner match your true probabilities!
One reason this game is nice is that it does not require having a correctly specified utility function that you are trying to maximize in expectation. There are only two states, win and lose, and as long as winning is prefered to losing, you should construct your spinner with your true probabilities.
Unfortunately this doesnt work for the confidence intervals, since they seem to require a score that is not bounded below.
- SilentCal 15 Feb 2018 16:53 UTC
  7 points
  Parent
  Two spins only works for two possible answers. Do you need N spins for N answers?
  - Scott Garrabrant 15 Feb 2018 18:16 UTC
    5 points
    Parent
    You are correct. It doesn’t work for more than two answers. I knew that when I thought about this before, but forgot. Corrected above.
    I dont have a nice algorithm for N answers. I tried a bunch of the obvious simple things, and they dont work.
    - Neil Fitzgerald 19 Feb 2018 13:18 UTC
      4 points
      Parent
      I think an algorithm for N outcomes is: spin twice, gain 1 every time you get the answer right but lose 1 if both guesses are the same.
      One can “see intuitively” why it works: when we increase the spinner-probability of outcome i by a small delta (imagining that all other probabilities stay fixed, and not worrying about the fact that our sum of probabilities is now 1 + delta) then the spinner-probability of getting the same outcome twice goes up by 2 x delta x p[i]. However, on each spin we get the right answer delta x q[i] more of the time, where q[i] is the true probability of outcome i. Since we’re spinning twice we get the right answer 2 x delta x q[i] more often. These cancel out if and only if p[i] = q[i]. [Obviously some work would need to be done to turn that into a proof...]
      - gjm 19 Feb 2018 15:18 UTC
        2 points
        Parent
        Just to be clear: if you spin twice and both come up right, you’re gaining 2 and then losing 1? (I.e., this is equivalent to what you wrote in an earlier version of the comment?)
        Neil Fitzgerald 19 Feb 2018 15:31 UTC
        1 point
        Parent
        That’s right.
- Ben Pace 13 Feb 2018 8:00 UTC
  4 points
  Parent
  (Why does the two-spin work?)
  - Scott Garrabrant 13 Feb 2018 8:15 UTC
    7 points
    Parent
    In a true/false question that is true with probability $p$ , if you assign probability $q$ , your probability of losing is $p (1 - q)^{2} + (1 - p) q^{2}$ . (The probabily the answer is true and you spin false twice plus the probability the answer is false and you spin true twice.)
    This probability is minimized when its derivative with respect to $q$ is $0$ , or at the boundary. This derivative is $- 2 p (1 - q) + 2 (1 - p) q$ , whis is $0$ when $q = p$ . We now know the minimum is achieved when $q$ is $0$ , $1$ , or $p$ . The probability of losing when $q = 0$ is $p$ . The probability of losing when $q = 1$ is $1 - p$ . The probability of losing when $q = p$ is $p (1 - p)$ , which is the lowest of the three options.
    - Scott Garrabrant 13 Feb 2018 8:45 UTC
      3 points
      Parent
      Copied without LaTeX:
      In a true/false question that is true with probability p, if you assign probability q, your probability of losing is p(1−q)^2+(1−p)q^2. (The probabily the answer is true and you spin false twice plus the probability the answer is false and you spin true twice.)
      This probability is minimized when its derivative with respect to q is 0, or at the boundary. This derivative is −2p(1−q)+2(1−p)q, whis is 0 when q=p. We now know the minimum is achieved when q is 0, 1, or p. The probability of losing when q=0 is p. The probability of losing when q=1 is 1−p. The probability of losing when q=p is p(1−p), which is the lowest of the three options.
      - Qiaochu_Yuan 13 Feb 2018 17:09 UTC
        4 points
        Parent
        This is called either Brier or quadratic scoring, not sure which.
        Kevin S. Van Horn 15 Feb 2018 18:24 UTC
        2 points
        Parent
        Not exactly. Its expected value is the same as the expected value of the Brier score, but the score itself is either 0 or 1.
    - Scott Garrabrant 13 Feb 2018 8:29 UTC
      1 point
      Parent
      For some reason, the latex is not rendering for me. I can see it when I edit the comment, but not otherwise.
      - Ben Pace 13 Feb 2018 8:55 UTC
        2 points
        Parent
        The comment has just started rendering for me.
        Edit: Oh wait no, you just added another comment without LaTex.
        habryka 13 Feb 2018 8:58 UTC
        2 points
        Parent
        Huh, that’s really weird. The server must somehow be choking on the specific LaTeX you posted. Will check it out.
        habryka 13 Feb 2018 9:04 UTC
        12 points
        0
        Parent
        Ok, I found the bug. I will fix it in the morning.
        Kaj_Sotala 14 Feb 2018 16:57 UTC
        10 points
        Parent
        And you did! Cheers for your hard work. :)