Hypothesis: the predictions on the population of Europe are bimodal, split between people thinking of geographical Europe (739M) vs people thinking of the EU (508M). I’m going to go check the data and report back.
Here’s a “sideways cumulative density function”, showing all guesses from lowest to highest:
There were a lot of guesses of “500” but that might just be because 500 is a nice round number. There were more people guessing within 50 of 508M (165) than in the 100-wide regions immediately above or below (126 within 50 of 408, 88 within 50 of 608) and more people guessing within 50 of 739 (107) than in the 100-wide regions immediately above or below (91 within 50 of 639, 85 within 50 of 839).
Here’s a histogram that shows this, but in order to actually see a dip between the 508ish numbers and 739ish numbers the bucketing needs to group those into separate categories with another category in between, so I don’t trust this very much:
If someone knows how to make an actual probability density function chart that would be better, because it wouldn’t be sensitive to these arbitrary divisions on where to place the histogram boundaries.
It looks plausibly bimodal, though one might want to construct a suitable hypothesis test for unimodality versus multimodality. Unfortunately, as you noted, we cannot distinguish between the hypothesis that the bimodality is due to rounding (at 500 M) versus the hypothesis that the bimodality is due to ambiguity between Europe and the EU. This holds even if a hypothesis test rejects a unimodal model, but if anyone is still interested in testing for unimodality, I suggest considering Efron and Tibshirani’s approach using the bootstrap.
Edit: Updated the plot. I switched from adaptive bandwidth to fixed bandwidth (because it seems to achieve higher efficiency), so parts of what I wrote below are no longer relevant—I’ve put these parts in square brackets.
Plot notes: [The adaptive bandwidth was achieved with Mathematica’s built-in “Adaptive” option for SmoothKernelDistribution, which is horribly documented; I think it uses the same algorithm as ‘akj’ in R’s quantreg package.] A Gaussian kernel was used with the bandwidth set according to Silverman’s rule-of-thumb [and the sensitivity (‘alpha’ in akj’s documentation) set to 0.5]. The bootstrap confidence intervals are “biased and unaccelerated” because I don’t (yet) understand how bias-corrected and accelerated bootstrap confidence intervals work. Tick marks on the x-axis represent the actual data with a slight jitter added to each point.
Me too, at least sort of—I just had a number stored in my brain that I associated with “Europe.” Turned out it was EU only, although I didn’t have any confusion about the question—I thought I was answering for all of Europe.
The misinterpretation of the survey’s meaning of “Europe” as “EU” is itself a failure as significant as wrongly estimating its population… so it’s not as if it excuses people who got it wrong and yet neither sought for clarification, nor took the possibility of misinterpretation into account when giving their confidence ratios...
I don’t think you’re making the argument that Yvain deliberately wanted to trick people into giving a wrong answer—so I really don’t see your analogy as illuminating anything.
It was a question. People answered it wrongly whether by making a wrong estimation of the answer, or by making a wrong estimation of the meaning of the question. Both are failures—and why should we consider the latter failure as any less significant than the former?
EDIT TO ADD: Mind you, reading the excel of the answers it seems I’m among the people who gave an answer in individuals when the question was asking number in millions. So it’s not as if I didn’t also have a failure in answering—and yet I do consider that one a less significant failure. Perhaps I’m just being hypocritical in this though.
Perhaps I’m just being hypocritical in this though.
Confirm. ;) (Nope, I didn’t misinterpret it as EU.)
Even if people recognized the ambiguity, it’s not obvious that one should go for an intermediate answer rather than putting all one’s eggs in one basket by guessing which was meant. If I were taking the survey and saw that ambiguity, I’d probably be confused for a bit, then realize I was taking longer than I’d semi-committed to taking, answer make a snap judgement, and move on.
The continent is basically never called just “America” in modern English (except in the phrases “North America” and “South America”), it’s “the Americas”.
Its also not obvious that people who went with the EU interpretation were incorrect. Language is contextual, if we were to parse the Times, Guardian, BBC, etc over the past year and see how the word “Europe” is actually used, it might be the land mass, or it might be the EU. Certainly one usage will have been more common than the other, but its not obvious to me which one it will have been.
That said, if I had noticed the ambiguity and not auto parsed it as EU, I probably would have expected the typical American to use Europe as land mass and since I think Yvain is American that’s what I should have gone with.
On the other other hand, the goal of the question is to gauge numerical calibration, not to gauge language parsing. If someone thought they were answering about the EU, and picked a 90% confidence interval that did in fact include the population of the EU that gives different information about the quantity we are trying to measure then if someone thinks Europe means the continent including Russia and picks a 90% confidence interval that does not include the population of the landmass. Remember this is not a quiz in school to see if someone gets “the right answer” this is a tool that’s intended to measure something.
Hypothesis: the predictions on the population of Europe are bimodal, split between people thinking of geographical Europe (739M) vs people thinking of the EU (508M). I’m going to go check the data and report back.
I’ve cleaned up the data and put it here.
Here’s a “sideways cumulative density function”, showing all guesses from lowest to highest:
There were a lot of guesses of “500” but that might just be because 500 is a nice round number. There were more people guessing within 50 of 508M (165) than in the 100-wide regions immediately above or below (126 within 50 of 408, 88 within 50 of 608) and more people guessing within 50 of 739 (107) than in the 100-wide regions immediately above or below (91 within 50 of 639, 85 within 50 of 839).
Here’s a histogram that shows this, but in order to actually see a dip between the 508ish numbers and 739ish numbers the bucketing needs to group those into separate categories with another category in between, so I don’t trust this very much:
If someone knows how to make an actual probability density function chart that would be better, because it wouldn’t be sensitive to these arbitrary divisions on where to place the histogram boundaries.
Here is a kernel density estimate of the “true” distribution, with bootstrapped) pointwise 95% confidence bands from 999 resamples:
It looks plausibly bimodal, though one might want to construct a suitable hypothesis test for unimodality versus multimodality. Unfortunately, as you noted, we cannot distinguish between the hypothesis that the bimodality is due to rounding (at 500 M) versus the hypothesis that the bimodality is due to ambiguity between Europe and the EU. This holds even if a hypothesis test rejects a unimodal model, but if anyone is still interested in testing for unimodality, I suggest considering Efron and Tibshirani’s approach using the bootstrap.
Edit: Updated the plot. I switched from adaptive bandwidth to fixed bandwidth (because it seems to achieve higher efficiency), so parts of what I wrote below are no longer relevant—I’ve put these parts in square brackets.
Plot notes: [The adaptive bandwidth was achieved with Mathematica’s built-in “Adaptive” option for SmoothKernelDistribution, which is horribly documented; I think it uses the same algorithm as ‘akj’ in R’s quantreg package.] A Gaussian kernel was used with the bandwidth set according to Silverman’s rule-of-thumb [and the sensitivity (‘alpha’ in akj’s documentation) set to 0.5]. The bootstrap confidence intervals are “biased and unaccelerated” because I don’t (yet) understand how bias-corrected and accelerated bootstrap confidence intervals work. Tick marks on the x-axis represent the actual data with a slight jitter added to each point.
As one datapoint I went with Europe as EU so it’s plausible others did too
Same here.
Me too, at least sort of—I just had a number stored in my brain that I associated with “Europe.” Turned out it was EU only, although I didn’t have any confusion about the question—I thought I was answering for all of Europe.
I also interpreted Europe as EU, although I was about 20% off that as well.
The misinterpretation of the survey’s meaning of “Europe” as “EU” is itself a failure as significant as wrongly estimating its population… so it’s not as if it excuses people who got it wrong and yet neither sought for clarification, nor took the possibility of misinterpretation into account when giving their confidence ratios...
You might as well ask, “Who is the president of America?” and then follow up with, “Ha ha got you! America is a continent, you meant USA.”
I don’t think you’re making the argument that Yvain deliberately wanted to trick people into giving a wrong answer—so I really don’t see your analogy as illuminating anything.
It was a question. People answered it wrongly whether by making a wrong estimation of the answer, or by making a wrong estimation of the meaning of the question. Both are failures—and why should we consider the latter failure as any less significant than the former?
EDIT TO ADD: Mind you, reading the excel of the answers it seems I’m among the people who gave an answer in individuals when the question was asking number in millions. So it’s not as if I didn’t also have a failure in answering—and yet I do consider that one a less significant failure. Perhaps I’m just being hypocritical in this though.
Confirm. ;) (Nope, I didn’t misinterpret it as EU.)
Even if people recognized the ambiguity, it’s not obvious that one should go for an intermediate answer rather than putting all one’s eggs in one basket by guessing which was meant. If I were taking the survey and saw that ambiguity, I’d probably be confused for a bit, then realize I was taking longer than I’d semi-committed to taking, answer make a snap judgement, and move on.
The continent is basically never called just “America” in modern English (except in the phrases “North America” and “South America”), it’s “the Americas”.
Its also not obvious that people who went with the EU interpretation were incorrect. Language is contextual, if we were to parse the Times, Guardian, BBC, etc over the past year and see how the word “Europe” is actually used, it might be the land mass, or it might be the EU. Certainly one usage will have been more common than the other, but its not obvious to me which one it will have been.
That said, if I had noticed the ambiguity and not auto parsed it as EU, I probably would have expected the typical American to use Europe as land mass and since I think Yvain is American that’s what I should have gone with.
On the other other hand, the goal of the question is to gauge numerical calibration, not to gauge language parsing. If someone thought they were answering about the EU, and picked a 90% confidence interval that did in fact include the population of the EU that gives different information about the quantity we are trying to measure then if someone thinks Europe means the continent including Russia and picks a 90% confidence interval that does not include the population of the landmass. Remember this is not a quiz in school to see if someone gets “the right answer” this is a tool that’s intended to measure something.
Yvain explicitly said “Wikipedia’s Europe page”.
Which users could not double-check because they might see the population numbers.
But they should expect the Wikipedia page to refer to the continent.