I’ve written a game (or see (github)) that tests your ability to assign probabilities to yes/no events accurately using a logarithmic scoring rule (called a Bayes score on LW, apparently).
For example, in the subgame “Coins from Urn Anise,” you’ll be told: “I have a mysterious urn labelled ‘Anise’ full of coins, each with possibly different probabilities. I’m picking a fresh coin from the urn. I’m about to flip the coin. Will I get heads? [Trial 1 of 10; Session 1]”. You can then adjust a slider to select a number a in [0,1]. As you adjust a, you adjust the payoffs that you’ll receive if the outcome of the coin flip is heads or tails. Specifically you’ll receive 1+log2(a) points if the result is heads and 1+log2(1-a) points if the result is tails. This is a proper scoring rule in the sense that you maximize your expected return by choosing a equal to the posterior probability that, given what you know, this coin will come out heads. The payouts are harshly negative if you have false certainty. E.g. if you choose a=0.995, you’d only stand to gain 0.993 if heads happens but would lose 6.644 if tails happens. At the moment, you don’t know much about the coin, but as the game goes on you can refine your guess. After 10 flips the game chooses a new coin from the urn, so you won’t know so much about the coin again, but try to take account of what you do know—it’s from the same urn Anise as the last coin (iid). If you try this, tell me what your average score is on play 100, say.
There’s a couple other random processes to guess in the game and also a quiz. The questions are intended to force you to guess at least some of the time. If you have suggestions for other quiz questions, send them to me by PM in the format:
This game has taught me something. I get more enjoyment than I should out of watching a random variable go up and down, and probably should avoid gambling. :)
Nice work, congrats! Looks fun and useful, better than the calibration apps I’ve seen so far (including one I made, that used confidence intervals—I had a proper scoring rule too!)
My score:
Current score: 3.544 after 10 plays, for an average score per play of 0.354.
For example, I was thinking of running it on nodejs and logging the scores of players, so you could see how you compare. (I don’t have a way to host this, right now, though.)
Or another possibility is to add diagnostics. E.g. were you setting your guess too high systematically or was it fluctuating more than the data would really say it should (under some models for the prior/posterior, say).
Also, I’d be happy to have pointers to your calibration apps or others you’ve found useful.
Thank you. I really, really want to see more of these.
Feature request #976: More stats to give you an indication of overconfidence / underconfidence. (e.g. out of 40 questions where you gave an answer between .45 and .55, you were right 70% of the time).
I’ve written a game (or see (github)) that tests your ability to assign probabilities to yes/no events accurately using a logarithmic scoring rule (called a Bayes score on LW, apparently).
For example, in the subgame “Coins from Urn Anise,” you’ll be told: “I have a mysterious urn labelled ‘Anise’ full of coins, each with possibly different probabilities. I’m picking a fresh coin from the urn. I’m about to flip the coin. Will I get heads? [Trial 1 of 10; Session 1]”. You can then adjust a slider to select a number a in [0,1]. As you adjust a, you adjust the payoffs that you’ll receive if the outcome of the coin flip is heads or tails. Specifically you’ll receive 1+log2(a) points if the result is heads and 1+log2(1-a) points if the result is tails. This is a proper scoring rule in the sense that you maximize your expected return by choosing a equal to the posterior probability that, given what you know, this coin will come out heads. The payouts are harshly negative if you have false certainty. E.g. if you choose a=0.995, you’d only stand to gain 0.993 if heads happens but would lose 6.644 if tails happens. At the moment, you don’t know much about the coin, but as the game goes on you can refine your guess. After 10 flips the game chooses a new coin from the urn, so you won’t know so much about the coin again, but try to take account of what you do know—it’s from the same urn Anise as the last coin (iid). If you try this, tell me what your average score is on play 100, say.
There’s a couple other random processes to guess in the game and also a quiz. The questions are intended to force you to guess at least some of the time. If you have suggestions for other quiz questions, send them to me by PM in the format:
{q:”1+1=2. True?”, a:1} // source: my calculator
where a:1 is for true and a:0 is for false.
Other discussion: probability calibration quizzes Papers: Some Comparisons among Quadratic, Spherical, and Logarithmic Scoring Rules; Bickel
This game has taught me something. I get more enjoyment than I should out of watching a random variable go up and down, and probably should avoid gambling. :)
Nice work, congrats! Looks fun and useful, better than the calibration apps I’ve seen so far (including one I made, that used confidence intervals—I had a proper scoring rule too!)
My score:
Thanks Emile,
Is there anything you’d like to see added?
For example, I was thinking of running it on nodejs and logging the scores of players, so you could see how you compare. (I don’t have a way to host this, right now, though.)
Or another possibility is to add diagnostics. E.g. were you setting your guess too high systematically or was it fluctuating more than the data would really say it should (under some models for the prior/posterior, say).
Also, I’d be happy to have pointers to your calibration apps or others you’ve found useful.
Thank you. I really, really want to see more of these.
Feature request #976: More stats to give you an indication of overconfidence / underconfidence. (e.g. out of 40 questions where you gave an answer between .45 and .55, you were right 70% of the time).