isionous comments on A note about calibration of confidence

isionous 12 Jan 2016 14:52 UTC
1 point

The method described in my post handles this situation perfectly well. All of your 50% predictions will (necessarily) come true 50% of the time, but you rack up a good calibration score if you do well on the rest of the predictions.

Seems like you’re giving up trying to get useful information about yourself from the 50% confidence predictions. Do you agree?
- jbay 12 Jan 2016 17:46 UTC
  0 points
  Parent
  Yes, but only because I don’t agree that there was any useful information that could have been obtained in the first place.
  - isionous 12 Jan 2016 18:55 UTC
    0 points
    Parent
    Could you comment about how my strategy outlined above would not give useful information?
    - jbay 13 Jan 2016 7:56 UTC
      1 point
      Parent
      The calibration you get, by the way, will be better represented by the fact that if you assigned 50% to the candidate that lost, then you’ll necessarily have assigned a very low probability to the candidate that won, and that will be the penalty that will tell you your calibration is wrong.
      
      The problem is the definition of more specific. How do you define specific? The only consistent definition I can think of is that a proposition A is more specific than B if the prior probability of A is smaller than that of B. Do you have a way to consistently tell whether one phrasing of a proposition is more or less specific than another?
      
      By that definition, if you have 10 candidates and no information to distinguish them, then the prior for any candidate to win is 10%. Then you can say “A: Candidate X will win” is more specific than “~A: Candidate X will not win”, because P(A) = 10% and P(~A) = 90%.
      
      Since the proposition “A with probability P” is the exact same claim as the proposition “~A with probability 1-P”; since they are the same proposition, there is no consistent definition of “specific” that will let one phrasing be more specific than the other when P = 50%.
      
      “Candidate X will win the election” is only more specific than “Candidate X will not win the election” if you think that it’s more likely that Candidate X will not win.
      
      For example, by your standard, which of these claims feels more specific to you?
      
      A: Trump will win the 2016 Republican nomination
      
      B: One of either Scott Alexander or Eliezer Yudkowsky will win the 2016 Republican nomination
      
      If you agree that “more specific” means “less probable”, then B is a more specific claim than A, even though there are twice as many people to choose from in B.
      
      Which of these phrasings is more specific?
      
      C: The winner of the 2016 Republican nomination will be a current member of the Republican party (membership: 30.1 million)
      
      ~C: The winner of the 2016 Republican nomination will not be a current member of the Republican party (non-membership: 7.1 billion, or 289 million if you only count Americans).
      
      The phrasing “C” certainly specifies a smaller number of people, but I think most people would agree that ~C is much less probable, since all of the top-polling candidates are party members. Which phrasing is more specific by your standard?
      
      If you have 10 candidates, it might seem more specific to phrase a proposition as “Candidate X will win the election with probability 50%” than “Candidate X will not win the election with probability 50%”. That intuition comes from the fact that an uninformed prior assigns them all 10% probability, so a claim that any individual one will win feels more specific in some way. But actually the specificity comes from the fact that if you claim 50% probability for one candidate when the uninformed prior was 10%, you must have access to some information about the candidates that allows you to be so confident. This will be properly captured by the log scoring rule; if you really do have such information, then you’ll get a better score by claiming 50% probability for the one most likely to win rather than 10% for each.
      
      Ultimately, the way you get information about your calibration is by seeing how well your full probability distribution about the odds of each candidate performs against reality. One will win, nine will lose, and the larger the probability mass you put on the winner, the better you do. Calibration is about seeing how well your beliefs score against reality; if your score depends on which of two logically equivalent phrasings you choose to express the same beliefs, there is some fundamental inconsistency in your scoring rule.
      - isionous 13 Jan 2016 16:50 UTC
        0 points
        Parent
        Thank you for your response.
        jbay 14 Jan 2016 7:22 UTC
        0 points
        Parent
        You’re welcome! And I’m sorry if I went a little overboard. I didn’t mean it to sound confrontational.
        isionous 14 Jan 2016 14:48 UTC
        0 points
        Parent
        
        sorry if I went a little overboard. I didn’t mean it to sound confrontational.
        
        You didn’t. I appreciated your response. Gave me a lot to think about.
        
        I still think there is some value to my strategy, especially if you don’t want to (or it would be unfeasible) to give full probability distribution for related events (ex: all the possible outcomes of an election).