Regarding 1: It seems like for most predictions you are taking whatever estimate people are making and then putting the same value in but with 5-10% less confidence. Is that your primary approach? You seem to be getting extremely accurate calibration this way especially when I compare the overall calibration to your calibration.
It’s my primary approach when making predictions on things I’m not familiar with (any prediction starting with ‘I’...). My general view is that when people aren’t being outrageously incorrect in whatever fashion (like XiXiDu’s recent Khan Academy prediction), they tend to be overconfident; the solution to that is adjusting their prediction towards 50%.
If you look at my Intrade-based predictions, you’ll see that while sometimes I just punt and copy Intrade, sometimes I differ severely. It’s a case by case thing.
(Also, I’m not sure you’re interpreting the graphs right. My understanding is that the graphs show that I am substantially underconfident as compared to PB in general. EDIT: I seem to be wrong here.)
(Also, I’m not sure you’re interpreting the graphs right. My understanding is that the graphs show that I am substantially underconfident as compared to PB in general.)
Every 10% range should have an actual certainty about midway in the range right? So for example, for the “50%” range a perfect calibration would be 55% (assuming equidistribution over the whole 50-60 range). For PB as a whole, every category is at least 10% off. For PB as a whole we get: 55% compared to an abysmal 37%, 65% compared to 58%, 75% compared to 58%, 85% compared to 70%, 95% compared to 79%. And the real kicker is that the 100% category is wrong one fifth of the time. In contrast, your numbers are 55% going to 41%, 65% going to 51%, 75% going to 60%, 85% going to 86%, and 95% going to 92%. Your 100% goes to 93%. Thus, with the exception of the 60-70 range every one of yours is better calibrated, and your 80-90 and 90-100 ranges are nearly spot on. Am I misinterpreting the graphs?
You know, I thought that I was supposed to have as flat a line as possible, but now I’m not sure. Re-reading the two axis, I guess the ideal graph is not the green line, but a line at a 45 degree angle going from 50%/50% in the middle-left to 100%/100% in the upper-right.
Have I been misreading the graphs this entire time? How embarrassing! I guess these graphs could be clearer, and explicitly graph the ‘ideal’ line...
You know, I sort of presumed you were one of the people who had been involved in setting up PB because you spend so much time with it and seem to know its ins and outs. But your comment suggests that’s not the case. Who does run it?
Tricycle runs it, like LW (see Eliezer’s ANN). Matthew Fallenshaw seems to be the one most involved with it—at least, I’ve always corresponded with him about it.
Regarding 1: It seems like for most predictions you are taking whatever estimate people are making and then putting the same value in but with 5-10% less confidence. Is that your primary approach? You seem to be getting extremely accurate calibration this way especially when I compare the overall calibration to your calibration.
It’s my primary approach when making predictions on things I’m not familiar with (any prediction starting with ‘I’...). My general view is that when people aren’t being outrageously incorrect in whatever fashion (like XiXiDu’s recent Khan Academy prediction), they tend to be overconfident; the solution to that is adjusting their prediction towards 50%.
If you look at my Intrade-based predictions, you’ll see that while sometimes I just punt and copy Intrade, sometimes I differ severely. It’s a case by case thing.
(Also, I’m not sure you’re interpreting the graphs right. My understanding is that the graphs show that I am substantially underconfident as compared to PB in general. EDIT: I seem to be wrong here.)
Every 10% range should have an actual certainty about midway in the range right? So for example, for the “50%” range a perfect calibration would be 55% (assuming equidistribution over the whole 50-60 range). For PB as a whole, every category is at least 10% off. For PB as a whole we get: 55% compared to an abysmal 37%, 65% compared to 58%, 75% compared to 58%, 85% compared to 70%, 95% compared to 79%. And the real kicker is that the 100% category is wrong one fifth of the time. In contrast, your numbers are 55% going to 41%, 65% going to 51%, 75% going to 60%, 85% going to 86%, and 95% going to 92%. Your 100% goes to 93%. Thus, with the exception of the 60-70 range every one of yours is better calibrated, and your 80-90 and 90-100 ranges are nearly spot on. Am I misinterpreting the graphs?
You know, I thought that I was supposed to have as flat a line as possible, but now I’m not sure. Re-reading the two axis, I guess the ideal graph is not the green line, but a line at a 45 degree angle going from 50%/50% in the middle-left to 100%/100% in the upper-right.
Have I been misreading the graphs this entire time? How embarrassing! I guess these graphs could be clearer, and explicitly graph the ‘ideal’ line...
You know, I sort of presumed you were one of the people who had been involved in setting up PB because you spend so much time with it and seem to know its ins and outs. But your comment suggests that’s not the case. Who does run it?
Tricycle runs it, like LW (see Eliezer’s ANN). Matthew Fallenshaw seems to be the one most involved with it—at least, I’ve always corresponded with him about it.