Since I first read about calibration on LessWrong, I’ve been trying this with tests and debate tournaments.
With a sample size of about 50: 95% of my estimated test grades are within 3% of my actual test grades.
On debate, however, if I am 60% confident I won a round, I won it 90% of the time; if I am 80% confident I won, I win 100% of the time. Other people seem to be much better than me at assessing the probability I won a debate round (if they observed it).
It seems that I am really good at some forms of estimating, and really bad in other situations, which means that overall switching from Inside View to Outside View wouldn’t necessarily be an improvement, but that in certain situations it would help me enormously. Has anyone else encountered this?
Interesting that your debate predictions tend too low. In my debate experience, nearly everyone consistently overestimated their likelihood of winning a given round. This bias tended to increase the better the debaters perceived themselves to be.
I think a lot of debaters I know fall into the general trap of believing the things they argue. In a debate round, you have to be focused on the mentality of “I’m winning”, or you won’t be able to convince the judge of that; I am probably atypical in that I notice that kind of self-deception and apparently overcorrect for it. I’ve convinced a number of my teammates to try this experiment as well, and most of them follow the trend you noticed.
My own experience of debating is that while I can estimate the ‘strategic’ side relatively effectively I find it more difficult to predict whether the judges accept an individual argument. I’ve noticed this as a problem with several debaters, often due to the inferential gaps between them and the judges (e.g. assuming some psychological/philosophical/economic concept is intuitively obvious).
[Incidentally, I’m involved in UK bp debating, so if that makes it probable we’ve met pm me a name or a hint. ]
Nope, US high school policy. I’m thinking of writing an article on debate and rationality (though not until after I’m done applying to college, which will be January); if you’d have something to say about that, PM me.
Could the debate tournaments be to some extent responsible for extremely irritating counter productive arguments online where you are left wondering what exactly did so much convince the other side and why they won’t tell what it is? I never did debates at school.
I’ve encountered similar things insofar as I’m better calibrated for some tasks than others. And I agree with you that defining the right reference classes for when to trust my estimations vs. when to trust the outside view (and which outside views to trust) is important.
I’m curious: if you re-express your data set in terms of standard deviations… e.g., the percentage of your estimated test grades that are within a std dev of the correct answer… rather than absolute percentages, do you still get very different results in the two cases?
I meant within the set of your 50 test scores, assuming they’re normalized to a common range.
To pick an extreme example: if all your test scores fall between 92% and 98%, it becomes less remarkable that your estimations of your test scores all fall within 3% of your actual test scores… anyone else could do about as well, given that fact about the data set. So it seems that knowing something about the distribution is helpful in reasoning about the causes of the differences in the accuracy of your judgments.
Since I first read about calibration on LessWrong, I’ve been trying this with tests and debate tournaments.
With a sample size of about 50: 95% of my estimated test grades are within 3% of my actual test grades.
On debate, however, if I am 60% confident I won a round, I won it 90% of the time; if I am 80% confident I won, I win 100% of the time. Other people seem to be much better than me at assessing the probability I won a debate round (if they observed it).
It seems that I am really good at some forms of estimating, and really bad in other situations, which means that overall switching from Inside View to Outside View wouldn’t necessarily be an improvement, but that in certain situations it would help me enormously. Has anyone else encountered this?
Interesting that your debate predictions tend too low. In my debate experience, nearly everyone consistently overestimated their likelihood of winning a given round. This bias tended to increase the better the debaters perceived themselves to be.
I think a lot of debaters I know fall into the general trap of believing the things they argue. In a debate round, you have to be focused on the mentality of “I’m winning”, or you won’t be able to convince the judge of that; I am probably atypical in that I notice that kind of self-deception and apparently overcorrect for it. I’ve convinced a number of my teammates to try this experiment as well, and most of them follow the trend you noticed.
My own experience of debating is that while I can estimate the ‘strategic’ side relatively effectively I find it more difficult to predict whether the judges accept an individual argument. I’ve noticed this as a problem with several debaters, often due to the inferential gaps between them and the judges (e.g. assuming some psychological/philosophical/economic concept is intuitively obvious).
[Incidentally, I’m involved in UK bp debating, so if that makes it probable we’ve met pm me a name or a hint. ]
Nope, US high school policy. I’m thinking of writing an article on debate and rationality (though not until after I’m done applying to college, which will be January); if you’d have something to say about that, PM me.
Could the debate tournaments be to some extent responsible for extremely irritating counter productive arguments online where you are left wondering what exactly did so much convince the other side and why they won’t tell what it is? I never did debates at school.
I’ve encountered similar things insofar as I’m better calibrated for some tasks than others. And I agree with you that defining the right reference classes for when to trust my estimations vs. when to trust the outside view (and which outside views to trust) is important.
I’m curious: if you re-express your data set in terms of standard deviations… e.g., the percentage of your estimated test grades that are within a std dev of the correct answer… rather than absolute percentages, do you still get very different results in the two cases?
Maybe I’m being really stupid, but how exactly would I define a standard deviation of the correct answer? Using the distribution for the whole class?
I meant within the set of your 50 test scores, assuming they’re normalized to a common range.
To pick an extreme example: if all your test scores fall between 92% and 98%, it becomes less remarkable that your estimations of your test scores all fall within 3% of your actual test scores… anyone else could do about as well, given that fact about the data set. So it seems that knowing something about the distribution is helpful in reasoning about the causes of the differences in the accuracy of your judgments.
Oh, that makes sense.
Nope, still a big difference. For example, here are my scores from the last few weeks:
Predicted/Actual: 98/100 72/72.5 94/94 85/86 82.5/87.5 90/92
Interesting that there were no too-high predictions.