Calibration may be achievable by a general procedure of making and testing (banded) predictions, but I wouldn’t trust anyone’s calibration in a particular domain on evidence of calibration in another.
In other words, people will have studied the accuracy of only some of their maps.
Do you have any evidence for this? I don’t remember any strongly domain-specific results in Tetlock’s study, the book I read about calibration in business, or any studies. Nor does Wikipedia mention anything except domain experts being overconfident (as opposed to people being random outside their domain even when supposedly calibrated, as you imply), which is fixable with calibration training.
And this is what I would expect given that the question is not about accuracy (one would hope experts would win in a particular domain) but about calibration—why can’t one accurately assess, in general, one’s ignorance?
(I have >1100 predictions registered on PB.com and >=240 judged so far; I can’t say I’ve noticed any especial domain-related correlations.)
Your point regarding the overconfidence of most domain experts is a strong one. I’ve updated :) This is not quite antipodal to the incompetent most overestimating their percentile competence—D-K.
I was merely imagining, without evidence, that some of the calibration training would be general and some would be domain specific. Certainly you’d learn to calibrate, in general. You just wouldn’t automatically be calibrated in all domains. Obviously, if you’ve optimized on your expertise in a domain (or worse: on getting credit for a single bold overconfident guess), then I don’t expect you to have optimized your calibration for that domain. In fact, I have only a weak opinion about whether domain experts should be better or worse calibrated on average in their natural state. I’m guessing they’ll overly signal confidence (to their professional+status benefit) moreso than that they’re really more overconfident (when it comes to betting their own money).
Fortunately, Dunning-Kruger does not seem to be universal (not that anyone who would understand or care about calibration would also be in the stupid-enough quartiles in the first place).
Certainly you’d learn to calibrate, in general. You just wouldn’t automatically be calibrated in all domains.
Again, I don’t see why I couldn’t. All I need is a good understanding of what I know, and then anytime I run into predictions on things I don’t know about, I should be able to estimate my ignorance and adjust my predictions closer to 50% as appropriate. If I am mistaken, well, in some areas I will be underconfident and in some overconfident, and they balance out.
If there’s a single thing mainly responsible for making people poor estimators of their numerical certainty (judged against reality), then you’re probably right. For example, it makes sense for me to be overconfident in my pronouncements if I want people to listen to me, and there’s little chance of me being caught in my overconfidence. This motivation is strong and universal. But I can learn to realize that I’m effectively lying (everyone does it, so maybe I should persist in most arenas), and report more honestly and accurately, if only to myself, after just a little practice in the skill of soliciting the right numbers for my level of information about the proposition I’m judging.
I have no data, so I’ll disengage until I have some.
(I have >1100 predictions registered on PB.com and >=240 judged so far; I can’t say I’ve noticed any especial domain-related correlations.)
Note that there are some large classes of predictions which by nature will strongly cluster and won’t show up until a fair bit in the future. For example there are various AI related predictions going about 100 years out. You’ve placed bets on 12 of them by my count. They strongly correlate with each other (for example general AI by 2018 and general AI by 2030). For that sort of issue it is very hard to notice domain related correlation when almost nothing in the domain has reached its judgement date yet. There are other issues with this sort of thing as well, such as a variety of the long-term computational complexity predictions (I’m ignoring here the Dick Lipton short-term statements which everyone seems to think are just extremely optimistic.). Have there been enough different domains that have had a lot of questions that one could notice domain specific predictions?
Calibration may be achievable by a general procedure of making and testing (banded) predictions, but I wouldn’t trust anyone’s calibration in a particular domain on evidence of calibration in another.
In other words, people will have studied the accuracy of only some of their maps.
Do you have any evidence for this? I don’t remember any strongly domain-specific results in Tetlock’s study, the book I read about calibration in business, or any studies. Nor does Wikipedia mention anything except domain experts being overconfident (as opposed to people being random outside their domain even when supposedly calibrated, as you imply), which is fixable with calibration training.
And this is what I would expect given that the question is not about accuracy (one would hope experts would win in a particular domain) but about calibration—why can’t one accurately assess, in general, one’s ignorance?
(I have >1100 predictions registered on PB.com and >=240 judged so far; I can’t say I’ve noticed any especial domain-related correlations.)
p.s. that’s a lot of predictions :)
How many would you have thought gwern had?
I found this question puzzling, and difficult to answer (I’m sleep deprived). Funny joke if you were sneakily trying to get me to make a prediction.
Unfortunately I’m pretty well anchored now.
I’d expect LW-haunters who decide to make predictions at PB.com to make 15 on the first day and 10 in the next year (with a mode of 0).
Your point regarding the overconfidence of most domain experts is a strong one. I’ve updated :) This is not quite antipodal to the incompetent most overestimating their percentile competence—D-K.
I was merely imagining, without evidence, that some of the calibration training would be general and some would be domain specific. Certainly you’d learn to calibrate, in general. You just wouldn’t automatically be calibrated in all domains. Obviously, if you’ve optimized on your expertise in a domain (or worse: on getting credit for a single bold overconfident guess), then I don’t expect you to have optimized your calibration for that domain. In fact, I have only a weak opinion about whether domain experts should be better or worse calibrated on average in their natural state. I’m guessing they’ll overly signal confidence (to their professional+status benefit) moreso than that they’re really more overconfident (when it comes to betting their own money).
Fortunately, Dunning-Kruger does not seem to be universal (not that anyone who would understand or care about calibration would also be in the stupid-enough quartiles in the first place).
Again, I don’t see why I couldn’t. All I need is a good understanding of what I know, and then anytime I run into predictions on things I don’t know about, I should be able to estimate my ignorance and adjust my predictions closer to 50% as appropriate. If I am mistaken, well, in some areas I will be underconfident and in some overconfident, and they balance out.
If there’s a single thing mainly responsible for making people poor estimators of their numerical certainty (judged against reality), then you’re probably right. For example, it makes sense for me to be overconfident in my pronouncements if I want people to listen to me, and there’s little chance of me being caught in my overconfidence. This motivation is strong and universal. But I can learn to realize that I’m effectively lying (everyone does it, so maybe I should persist in most arenas), and report more honestly and accurately, if only to myself, after just a little practice in the skill of soliciting the right numbers for my level of information about the proposition I’m judging.
I have no data, so I’ll disengage until I have some.
Note that there are some large classes of predictions which by nature will strongly cluster and won’t show up until a fair bit in the future. For example there are various AI related predictions going about 100 years out. You’ve placed bets on 12 of them by my count. They strongly correlate with each other (for example general AI by 2018 and general AI by 2030). For that sort of issue it is very hard to notice domain related correlation when almost nothing in the domain has reached its judgement date yet. There are other issues with this sort of thing as well, such as a variety of the long-term computational complexity predictions (I’m ignoring here the Dick Lipton short-term statements which everyone seems to think are just extremely optimistic.). Have there been enough different domains that have had a lot of questions that one could notice domain specific predictions?
All that is true—and why it was the last and least of my points, and in parentheses even.