I need help figuring out how to use this scoring rule. Please consider the following application.
How much does it cost to mail a letter under 30g in Canada?1
I remember when I was a child buying 45c stamps, so it’s likely to be larger than that. It’s been over a decade or so, and assuming a 2% rise in cost per year, then we should be around 45∗(1.02)10∼60c per stamp. However, we also had big budget cuts to our postal service that even I learned about despite not reading the news. Let’s say that Canada Post increased their prices by 25% to accomodate some shortfall. My estimate is that stamps cost 75c.
What should be my confidence interval? Would I be surprised if a stamp cost a dollar? Not really, but it feels like an upper bound. Would I be surprised if a stamp cost less than 50c? Yes. 60c? Yes. 70c? Hmmm.… Assume that I’m well calibrated, so I’m reporting 90% confidence for an interval of stamps costing 70c to 100c.
Answer: Stamps in booklets cost 85c each, individual stamps are 100c each. Because I would always buy stamps in booklets, I will use the 85c figure.
S is the size of my confidence interval, S=100−70=30 . D is the distance between the true value and the interval, but is 0 in this case because the true value is in the interval.
Score=−S−20⋅D=−30
I’m not really sure what to do with this number, so let’s move to the next paragraph of the post.
The true value is T=85 and the interval is (L,U)=(70,100). Because the true value is contained in the interval, D=0.
S=log(UL)=log(10070)=0.15
Score=−S−20⋅D=−0.15
How does this incentivise honest reporting of confidence intervals?
Let’s say that, when I intuited my confidence interval above that I was perturbed that it wasn’t symmetric about my estimate of 75c, so I set it to (L,U)=(50,100) for aesthetic reasons. In this case, my score would be Score=−0.30 Which is worse than my previous score by a factor of 2.
Let’s say that, when I remembered the price of stamps in my childhood, I was way off and remembered 14c stamps. Then I would believe that stamps should cost around 22c now. (Here I have the feeling of “nothing costs less than a quarter!”, so I would probably reject this estimate.)That would likely anchor me, so that I would set a high confidence on the price being within (L,U)=(20,24)
S=0.08, D=log(LT)=log(2085)=−0.63
Score=−S−20⋅D=12.52
Am I trying to maximize this score?
1I looked up the answer, and the lowest cost standard delivery is for letters under 30g.
The idea is that the two terms in the score balance between two effects: trying to make S as small as possible means making your interval as small as possible, but if you make it too small you’re more likely to use an interval which doesn’t contain the truth. Trying to make D as small as possible means making your interval more likely to contain the truth. The coefficients balance the tradeoff between the two so that the interval you end up with is your 90% confidence interval. (According to Scott; I haven’t verified this personally.)
I have verified it. I was in the process of writing a (fairly lengthy) reply to Stefan’s comment, including a proof that Scott’s scoring rule does indeed have the property that your expected score (according to your actual beliefs about the quantity you’re estimating) is maximized when the confidence interval you state has (again according to your actual beliefs) a 5% chance that the quantity lies below its lower bound and a 5% chance that the quantity lies above its upper bound … but then something I did (I have no inkling what, though it coincided with some combination of keypresses as I was trying to enter some mathematics) made the page go entirely blank, and I didn’t find any way to get my partially-written comment back again.
Anyway, here’s one way (I don’t guarantee it’s best and it feels like there should be a slicker way) to prove it. Let’s suppose the confidence interval you state is (l,r); consider the derivative w.r.t. either of those bounds—let’s say r, but l is similar—of your expected score. The first term in the score is just l-r, and the derivative of that is always −1. The second term can be written as an integral; differentiating it w.r.t. r turns out to give you 20Pr(X>r). (The calculation is easy.) So the derivative is zero only when 1-20Pr(X>r)=0; that is, when Pr(X>r)=5%. So if the confidence interval you state doesn’t have the property that you expect to be above it exactly 5% of the time, then this derivative is nonzero and therefore some small change in r increases your expected score.
Suppose f is your probability density function for the quantity X you’re interested in.
Then the expectation of D is the integral of D(x)f(x), which equals the integral of [max(0,l-x)+max(0,x-r)]f(x). When we differentiate w.r.t. r, the first term obviously goes away because it’s independent of r, so we get the integral of [d/dr max(0,x-r)] f(x). That derivative is 0 for x<r and 1 for x>r, so this is the integral of f(x) from r upwards; in other words it’s Pr(X>r). So d(score)/dr = 1-20Pr(X>r).
The calculation for l is exactly the same but with a change of sign; we end up with 20Pr(X<l)-1.
You’re welcome. Something that I’m trying to improve about how I engage with lesswrong is writing out either a summary of the article (without re-refering to the article) or an explicit example of the concept in the article. My hope is that this will help me to actually grok what we’re discussing.
I need help figuring out how to use this scoring rule. Please consider the following application.
How much does it cost to mail a letter under 30g in Canada?1
I remember when I was a child buying 45c stamps, so it’s likely to be larger than that. It’s been over a decade or so, and assuming a 2% rise in cost per year, then we should be around 45∗(1.02)10∼60c per stamp. However, we also had big budget cuts to our postal service that even I learned about despite not reading the news. Let’s say that Canada Post increased their prices by 25% to accomodate some shortfall. My estimate is that stamps cost 75c.
What should be my confidence interval? Would I be surprised if a stamp cost a dollar? Not really, but it feels like an upper bound. Would I be surprised if a stamp cost less than 50c? Yes. 60c? Yes. 70c? Hmmm.… Assume that I’m well calibrated, so I’m reporting 90% confidence for an interval of stamps costing 70c to 100c.
Answer: Stamps in booklets cost 85c each, individual stamps are 100c each. Because I would always buy stamps in booklets, I will use the 85c figure.
S is the size of my confidence interval, S=100−70=30 . D is the distance between the true value and the interval, but is 0 in this case because the true value is in the interval.
Score=−S−20⋅D=−30
I’m not really sure what to do with this number, so let’s move to the next paragraph of the post.
The true value is T=85 and the interval is (L,U)=(70,100). Because the true value is contained in the interval, D=0.
S=log(UL)=log(10070)=0.15
Score=−S−20⋅D=−0.15
How does this incentivise honest reporting of confidence intervals?
Let’s say that, when I intuited my confidence interval above that I was perturbed that it wasn’t symmetric about my estimate of 75c, so I set it to (L,U)=(50,100) for aesthetic reasons. In this case, my score would be Score=−0.30 Which is worse than my previous score by a factor of 2.
Let’s say that, when I remembered the price of stamps in my childhood, I was way off and remembered 14c stamps. Then I would believe that stamps should cost around 22c now. (Here I have the feeling of “nothing costs less than a quarter!”, so I would probably reject this estimate.)That would likely anchor me, so that I would set a high confidence on the price being within (L,U)=(20,24)
S=0.08, D=log(LT)=log(2085)=−0.63
Score=−S−20⋅D=12.52
Am I trying to maximize this score?
1I looked up the answer, and the lowest cost standard delivery is for letters under 30g.
I messed up, and swapped the words overestimate and underestimate in the 4th paragraph. I fixed it now. Score should always be negative.
This will change the value at the end to D=log(8524), or 0.55, making the score −11.06.
This score is a very negative number, so you get punished for having a bad interval, relative to the −0.15 above.
The idea is that the two terms in the score balance between two effects: trying to make S as small as possible means making your interval as small as possible, but if you make it too small you’re more likely to use an interval which doesn’t contain the truth. Trying to make D as small as possible means making your interval more likely to contain the truth. The coefficients balance the tradeoff between the two so that the interval you end up with is your 90% confidence interval. (According to Scott; I haven’t verified this personally.)
I have verified it. I was in the process of writing a (fairly lengthy) reply to Stefan’s comment, including a proof that Scott’s scoring rule does indeed have the property that your expected score (according to your actual beliefs about the quantity you’re estimating) is maximized when the confidence interval you state has (again according to your actual beliefs) a 5% chance that the quantity lies below its lower bound and a 5% chance that the quantity lies above its upper bound … but then something I did (I have no inkling what, though it coincided with some combination of keypresses as I was trying to enter some mathematics) made the page go entirely blank, and I didn’t find any way to get my partially-written comment back again.
Anyway, here’s one way (I don’t guarantee it’s best and it feels like there should be a slicker way) to prove it. Let’s suppose the confidence interval you state is (l,r); consider the derivative w.r.t. either of those bounds—let’s say r, but l is similar—of your expected score. The first term in the score is just l-r, and the derivative of that is always −1. The second term can be written as an integral; differentiating it w.r.t. r turns out to give you 20Pr(X>r). (The calculation is easy.) So the derivative is zero only when 1-20Pr(X>r)=0; that is, when Pr(X>r)=5%. So if the confidence interval you state doesn’t have the property that you expect to be above it exactly 5% of the time, then this derivative is nonzero and therefore some small change in r increases your expected score.
would you mind spelling out the integral part?
Suppose f is your probability density function for the quantity X you’re interested in.
Then the expectation of D is the integral of D(x)f(x), which equals the integral of [max(0,l-x)+max(0,x-r)]f(x). When we differentiate w.r.t. r, the first term obviously goes away because it’s independent of r, so we get the integral of [d/dr max(0,x-r)] f(x). That derivative is 0 for x<r and 1 for x>r, so this is the integral of f(x) from r upwards; in other words it’s Pr(X>r). So d(score)/dr = 1-20Pr(X>r).
The calculation for l is exactly the same but with a change of sign; we end up with 20Pr(X<l)-1.
Thanks for this reply. The technique of asking what each term of your equation represents is one I have not practiced in some time.
This answer very much helped me to understand the model.
Thank you for providing an example!
You’re welcome. Something that I’m trying to improve about how I engage with lesswrong is writing out either a summary of the article (without re-refering to the article) or an explicit example of the concept in the article. My hope is that this will help me to actually grok what we’re discussing.
I get a dozen ’refresh to render LaTeX’s here (but refreshing doesn’t fix it).
Just wrapped up the fix for this. Pushing the fix in the next few minutes.
Fixed! Sorry for the inconvenience!