A poor showing on some questions, such as “what is the best-selling computer game”, doesn’t necessarily reflect poor calibration. It might instead simply mean that “the best-selling game is Minecraft” is a surprising fact to this particular community.
For example, I may have never heard of Minecraft, but had a large amount of exposure to evidence that “the best-selling game is Starcraft”, or “the best-selling game is Tetris”. During the time when I did play computer games, which was before the era of Minecraft, this maybe would have been true (or maybe it would have been a misperception even then). But when I look at the evidence that I have mentally available to recall, all of it might point to Starcraft.
Probability calibration can only be based on prior probabilties and available evidence. If I start with something like a max-entropy prior (assigning all computer games an equal probability of having been the best-selling ones) and then update it based on every time I remember hearing the popularity of game X discussed in the media, then the resulting sharpness of my probability distribution (my certainty) will depend on how much evidence I have, and how sharply it favours game X over others.
If I happened to have obtained my evidence during a time when Starcraft really was the best-selling computer game, my evidence will be sharply peaked around Starcraft, leading me (even as a perfect Bayesian) to conclude that Starcraft is the answer with high probability. Minecraft wouldn’t even make runner-up, especially if I haven’t heard of it.
When I learn afterward that the answer is Minecraft, that is a “surprising result”, because I was confident and wrong. That doesn’t necessarily mean I had false confidence or updated poorly, just that I had updated well on misleading evidence. However, we can’t have ‘evidence’ available that says our body of evidence is dubious.
If the whole community is likewise surprised that Minecraft is the right answer, that doesn’t necessarily indicate overconfidence in the community… it might instead indicate that our evidence was strongly and systematically biased, perhaps because most of us are not of the Minecraft generation ( not sure if 27 is too old for inecraft or not?).
Similarly, if the world’s scholars were all wrong and (somehow), in reality, the Norse God known as the All-Father was not called Odin but rather “Nido”, but this fact had been secretly concealed, then learning this fact would surprise everyone (including the authors of this survey). Our calibration results would suddenly look very bad on this question. We would all appear to be grossly overconfident. And yet, there would be no difference in the available evidence we’d had at hand (which all said “Odin” was the right answer).
I was confident and wrong. That doesn’t necessarily mean I had false confidence
Well… this can get into philosophical territory. It seems cleanest to describe confidence in something wrong as “false,” although it may have been the best possible answer for you to give at the time. Saying “the confidence wasn’t false because it was from a systematic bias” seems like it opens up a host of problems.
An idea that seems useful here is “being predictably surprised.” You can’t stop yourself from being surprised without omniscience, but you probably can stop yourself from being predictably surprised with a well-trained outside view. If the question had asked about, say, the median price of a meal at a restaurant, and you hadn’t frequented restaurants in 10 years, you would probably have very low confidence in the specific price you remember, since you expect things to have changed in 10 years; a similar approach would have worked here. (“Well, Starcraft was super popular, but it’s been a while; let’s say Starcraft with, say, 10% probability.”)
A note about calibration...
A poor showing on some questions, such as “what is the best-selling computer game”, doesn’t necessarily reflect poor calibration. It might instead simply mean that “the best-selling game is Minecraft” is a surprising fact to this particular community.
For example, I may have never heard of Minecraft, but had a large amount of exposure to evidence that “the best-selling game is Starcraft”, or “the best-selling game is Tetris”. During the time when I did play computer games, which was before the era of Minecraft, this maybe would have been true (or maybe it would have been a misperception even then). But when I look at the evidence that I have mentally available to recall, all of it might point to Starcraft.
Probability calibration can only be based on prior probabilties and available evidence. If I start with something like a max-entropy prior (assigning all computer games an equal probability of having been the best-selling ones) and then update it based on every time I remember hearing the popularity of game X discussed in the media, then the resulting sharpness of my probability distribution (my certainty) will depend on how much evidence I have, and how sharply it favours game X over others.
If I happened to have obtained my evidence during a time when Starcraft really was the best-selling computer game, my evidence will be sharply peaked around Starcraft, leading me (even as a perfect Bayesian) to conclude that Starcraft is the answer with high probability. Minecraft wouldn’t even make runner-up, especially if I haven’t heard of it.
When I learn afterward that the answer is Minecraft, that is a “surprising result”, because I was confident and wrong. That doesn’t necessarily mean I had false confidence or updated poorly, just that I had updated well on misleading evidence. However, we can’t have ‘evidence’ available that says our body of evidence is dubious.
If the whole community is likewise surprised that Minecraft is the right answer, that doesn’t necessarily indicate overconfidence in the community… it might instead indicate that our evidence was strongly and systematically biased, perhaps because most of us are not of the Minecraft generation ( not sure if 27 is too old for inecraft or not?).
Similarly, if the world’s scholars were all wrong and (somehow), in reality, the Norse God known as the All-Father was not called Odin but rather “Nido”, but this fact had been secretly concealed, then learning this fact would surprise everyone (including the authors of this survey). Our calibration results would suddenly look very bad on this question. We would all appear to be grossly overconfident. And yet, there would be no difference in the available evidence we’d had at hand (which all said “Odin” was the right answer).
Well… this can get into philosophical territory. It seems cleanest to describe confidence in something wrong as “false,” although it may have been the best possible answer for you to give at the time. Saying “the confidence wasn’t false because it was from a systematic bias” seems like it opens up a host of problems.
An idea that seems useful here is “being predictably surprised.” You can’t stop yourself from being surprised without omniscience, but you probably can stop yourself from being predictably surprised with a well-trained outside view. If the question had asked about, say, the median price of a meal at a restaurant, and you hadn’t frequented restaurants in 10 years, you would probably have very low confidence in the specific price you remember, since you expect things to have changed in 10 years; a similar approach would have worked here. (“Well, Starcraft was super popular, but it’s been a while; let’s say Starcraft with, say, 10% probability.”)