A while ago I was, for some reason, answering a few hundred questions with yes-or-no answers. I thought I would record my confidence in the answers in 5% intervals, to check my calibration. What I found was that for 60%+ confidence I am fairly well calibrated, but when I was 55% confidant I was only right 45% of the time (100)!
I think what happened is that sometimes I would think of a reason why the proposition X is true, and then think of some reasons why X is false, only I would now be anchored onto my original assessment that X is true. So instead of changing my mind to ‘X is false’ I would only decrease my confidence.
I.e. my thought processes looked like this
reason why X is true → X is true, 60% confidence → reasons why X is false → X is true, 55% confidence
When it should be:
reason why X is true → X is true, 60% confidence → reasons why X is false → CHANGE OPINION → X is false, 55% confidence
Did you write the questions or were they presented to you? If they were presented to you, then you have no choice in which of the two answers is “yes” and which is “no.” So it is meaningful for you distinguish between the questions for which you answered 55% and the questions for which you answered 45%. Did you find a symmetrical effect?
It was symmetric. I never answered 45% - to clarify, when I answered 55% I was right 45% of the time. And I only recorded whether I was right or wrong, not whether I was right about X being false.
The vast majority of yes/no questions you’re likely to face won’t support 5% intervals. You’re just not going to get enough data to have any idea whether the “true” calibration is what actually happens for that small selection of questions.
That said, I agree there’s an analytic flaw if you can change true to false on no additional data (kind of: you noticed salience of something you’d previously ignored, which may count as evidence depending on how you arrived at your prior) and only reduce confidence a tiny amount.
One suggestion that may help: don’t separate your answer from your confidence confidence, just calculate a probability. Not “true, 60% confidence” (implying 40% unknown, I think, not 40% false), but “80% likely to be true”. It really makes updates easier to calculate and understand.
The vast majority of yes/no questions you’re likely to face won’t support 5% intervals. You’re just not going to get enough data to have any idea whether the “true” calibration is what actually happens for that small selection of questions.
Tetlock found in the Good Judgement Project as described in his book Superforcasting that people who are excellent at forcasting do very finely grained predictions.
I disagree that you can’t get 5% intervals on random yes/no questions—if you stick with 10%, you really only have 5 possible values − 50-59%, 60-69%, 70-79%, 80-89%, and 90+%. That’s very coarse-grained.
The vast majority of yes/no questions you’re likely to face won’t support 5% intervals.
I agree [edit: actually, it depends on where these yes/no questions are coming from] , but think the questions I was looking at were in the small minority that do support 5% intervals.
Not “true, 60% confidence” (implying 40% unknown, I think, not 40% false)
Perhaps I should have provided more details to explain exactly what I did, because I actually did mean 60% true 40% false.
So, I already was thinking in the manner you advocate, but thanks for the advice anyway!
A while ago I was, for some reason, answering a few hundred questions with yes-or-no answers. I thought I would record my confidence in the answers in 5% intervals, to check my calibration. What I found was that for 60%+ confidence I am fairly well calibrated, but when I was 55% confidant I was only right 45% of the time (100)!
I think what happened is that sometimes I would think of a reason why the proposition X is true, and then think of some reasons why X is false, only I would now be anchored onto my original assessment that X is true. So instead of changing my mind to ‘X is false’ I would only decrease my confidence.
I.e. my thought processes looked like this
reason why X is true → X is true, 60% confidence → reasons why X is false → X is true, 55% confidence
When it should be:
reason why X is true → X is true, 60% confidence → reasons why X is false → CHANGE OPINION → X is false, 55% confidence
Did you write the questions or were they presented to you? If they were presented to you, then you have no choice in which of the two answers is “yes” and which is “no.” So it is meaningful for you distinguish between the questions for which you answered 55% and the questions for which you answered 45%. Did you find a symmetrical effect?
It was symmetric. I never answered 45% - to clarify, when I answered 55% I was right 45% of the time. And I only recorded whether I was right or wrong, not whether I was right about X being false.
The vast majority of yes/no questions you’re likely to face won’t support 5% intervals. You’re just not going to get enough data to have any idea whether the “true” calibration is what actually happens for that small selection of questions.
That said, I agree there’s an analytic flaw if you can change true to false on no additional data (kind of: you noticed salience of something you’d previously ignored, which may count as evidence depending on how you arrived at your prior) and only reduce confidence a tiny amount.
One suggestion that may help: don’t separate your answer from your confidence confidence, just calculate a probability. Not “true, 60% confidence” (implying 40% unknown, I think, not 40% false), but “80% likely to be true”. It really makes updates easier to calculate and understand.
Tetlock found in the Good Judgement Project as described in his book Superforcasting that people who are excellent at forcasting do very finely grained predictions.
I disagree that you can’t get 5% intervals on random yes/no questions—if you stick with 10%, you really only have 5 possible values − 50-59%, 60-69%, 70-79%, 80-89%, and 90+%. That’s very coarse-grained.
I agree [edit: actually, it depends on where these yes/no questions are coming from] , but think the questions I was looking at were in the small minority that do support 5% intervals.
Perhaps I should have provided more details to explain exactly what I did, because I actually did mean 60% true 40% false.
So, I already was thinking in the manner you advocate, but thanks for the advice anyway!