(Also, I’m not sure you’re interpreting the graphs right. My understanding is that the graphs show that I am substantially underconfident as compared to PB in general.)
Every 10% range should have an actual certainty about midway in the range right? So for example, for the “50%” range a perfect calibration would be 55% (assuming equidistribution over the whole 50-60 range). For PB as a whole, every category is at least 10% off. For PB as a whole we get: 55% compared to an abysmal 37%, 65% compared to 58%, 75% compared to 58%, 85% compared to 70%, 95% compared to 79%. And the real kicker is that the 100% category is wrong one fifth of the time. In contrast, your numbers are 55% going to 41%, 65% going to 51%, 75% going to 60%, 85% going to 86%, and 95% going to 92%. Your 100% goes to 93%. Thus, with the exception of the 60-70 range every one of yours is better calibrated, and your 80-90 and 90-100 ranges are nearly spot on. Am I misinterpreting the graphs?
You know, I thought that I was supposed to have as flat a line as possible, but now I’m not sure. Re-reading the two axis, I guess the ideal graph is not the green line, but a line at a 45 degree angle going from 50%/50% in the middle-left to 100%/100% in the upper-right.
Have I been misreading the graphs this entire time? How embarrassing! I guess these graphs could be clearer, and explicitly graph the ‘ideal’ line...
You know, I sort of presumed you were one of the people who had been involved in setting up PB because you spend so much time with it and seem to know its ins and outs. But your comment suggests that’s not the case. Who does run it?
Tricycle runs it, like LW (see Eliezer’s ANN). Matthew Fallenshaw seems to be the one most involved with it—at least, I’ve always corresponded with him about it.
Every 10% range should have an actual certainty about midway in the range right? So for example, for the “50%” range a perfect calibration would be 55% (assuming equidistribution over the whole 50-60 range). For PB as a whole, every category is at least 10% off. For PB as a whole we get: 55% compared to an abysmal 37%, 65% compared to 58%, 75% compared to 58%, 85% compared to 70%, 95% compared to 79%. And the real kicker is that the 100% category is wrong one fifth of the time. In contrast, your numbers are 55% going to 41%, 65% going to 51%, 75% going to 60%, 85% going to 86%, and 95% going to 92%. Your 100% goes to 93%. Thus, with the exception of the 60-70 range every one of yours is better calibrated, and your 80-90 and 90-100 ranges are nearly spot on. Am I misinterpreting the graphs?
You know, I thought that I was supposed to have as flat a line as possible, but now I’m not sure. Re-reading the two axis, I guess the ideal graph is not the green line, but a line at a 45 degree angle going from 50%/50% in the middle-left to 100%/100% in the upper-right.
Have I been misreading the graphs this entire time? How embarrassing! I guess these graphs could be clearer, and explicitly graph the ‘ideal’ line...
You know, I sort of presumed you were one of the people who had been involved in setting up PB because you spend so much time with it and seem to know its ins and outs. But your comment suggests that’s not the case. Who does run it?
Tricycle runs it, like LW (see Eliezer’s ANN). Matthew Fallenshaw seems to be the one most involved with it—at least, I’ve always corresponded with him about it.