If I understood the paper correctly, the following situation would be analogous. (I’ll have to recheck it tomorrow to make sure this example does match what they’re actually saying—it’s too late here for me to do it now.)
Imagine that you know that 30% of the people living in a certain city are black, and 70% are white. Next you’re presented with questions where you have to guess whether a certain inhabitant of the city is black or white. If you don’t have any other information, you know that consistently guessing “white” in every question will get you 70% correct. So when the questionnaire also asks you for your calibration, you say that you’re 70% certain for each question.
Now, assuming that the survey questions had been composed by randomly sampling from all the inhabitants of the city (a “representative” sampling), then you would indeed be correct about 70% of the time and be well-calibrated. But assume that instead, all the people the survey asked about live in a certain neighborhood, which happens to be predominantly black (a “selected” sampling). Now you might have only 40% right answers, while you indicated a confidence of 70%, so the researchers behind the survey mark you as overconfident.
Of course, in practice this is a bit more complicated as people don’t only use the ecological base rate but also other information that they happen to have at hand, but since the other information acts to modify their starting base rate (the prior), the same logic still applies.
If I understood the paper correctly, the following situation would be analogous. (I’ll have to recheck it tomorrow to make sure this example does match what they’re actually saying—it’s too late here for me to do it now.)
Imagine that you know that 30% of the people living in a certain city are black, and 70% are white. Next you’re presented with questions where you have to guess whether a certain inhabitant of the city is black or white. If you don’t have any other information, you know that consistently guessing “white” in every question will get you 70% correct. So when the questionnaire also asks you for your calibration, you say that you’re 70% certain for each question.
Now, assuming that the survey questions had been composed by randomly sampling from all the inhabitants of the city (a “representative” sampling), then you would indeed be correct about 70% of the time and be well-calibrated. But assume that instead, all the people the survey asked about live in a certain neighborhood, which happens to be predominantly black (a “selected” sampling). Now you might have only 40% right answers, while you indicated a confidence of 70%, so the researchers behind the survey mark you as overconfident.
Of course, in practice this is a bit more complicated as people don’t only use the ecological base rate but also other information that they happen to have at hand, but since the other information acts to modify their starting base rate (the prior), the same logic still applies.