jefftk comments on 2013 Survey Results

jefftk Jan 19, 2014, 2:13 PM
19 points
0
Hypothesis: the predictions on the population of Europe are bimodal, split between people thinking of geographical Europe (739M) vs people thinking of the EU (508M). I’m going to go check the data and report back.
- jefftk Jan 19, 2014, 3:30 PM
  8 points
  0
  Parent
  I’ve cleaned up the data and put it here.
  
  Here’s a “sideways cumulative density function”, showing all guesses from lowest to highest:
  
  There were a lot of guesses of “500” but that might just be because 500 is a nice round number. There were more people guessing within 50 of 508M (165) than in the 100-wide regions immediately above or below (126 within 50 of 408, 88 within 50 of 608) and more people guessing within 50 of 739 (107) than in the 100-wide regions immediately above or below (91 within 50 of 639, 85 within 50 of 839).
  
  Here’s a histogram that shows this, but in order to actually see a dip between the 508ish numbers and 739ish numbers the bucketing needs to group those into separate categories with another category in between, so I don’t trust this very much:
  
  If someone knows how to make an actual probability density function chart that would be better, because it wouldn’t be sensitive to these arbitrary divisions on where to place the histogram boundaries.
  - VincentYu Jan 19, 2014, 9:47 PM
    23 points
    0
    Parent
    Here is a kernel density estimate of the “true” distribution, with bootstrapped) pointwise 95% confidence bands from 999 resamples:
    
    It looks plausibly bimodal, though one might want to construct a suitable hypothesis test for unimodality versus multimodality. Unfortunately, as you noted, we cannot distinguish between the hypothesis that the bimodality is due to rounding (at 500 M) versus the hypothesis that the bimodality is due to ambiguity between Europe and the EU. This holds even if a hypothesis test rejects a unimodal model, but if anyone is still interested in testing for unimodality, I suggest considering Efron and Tibshirani’s approach using the bootstrap.
    
    Edit: Updated the plot. I switched from adaptive bandwidth to fixed bandwidth (because it seems to achieve higher efficiency), so parts of what I wrote below are no longer relevant—I’ve put these parts in square brackets.
    
    Plot notes: [The adaptive bandwidth was achieved with Mathematica’s built-in “Adaptive” option for SmoothKernelDistribution, which is horribly documented; I think it uses the same algorithm as ‘akj’ in R’s quantreg package.] A Gaussian kernel was used with the bandwidth set according to Silverman’s rule-of-thumb [and the sensitivity (‘alpha’ in akj’s documentation) set to 0.5]. The bootstrap confidence intervals are “biased and unaccelerated” because I don’t (yet) understand how bias-corrected and accelerated bootstrap confidence intervals work. Tick marks on the x-axis represent the actual data with a slight jitter added to each point.
    What links here?
    VincentYu's comment on 2013 Survey Results by Scott Alexander (Jan 20, 2014, 3:01 PM; 36 points)
- William_Quixote Jan 19, 2014, 3:21 PM
  5 points
  0
  Parent
  As one datapoint I went with Europe as EU so it’s plausible others did too
  - XiXiDu Jan 19, 2014, 3:27 PM
    4 points
    0
    Parent
    
    As one datapoint I went with Europe as EU so it’s plausible others did too
    
    Same here.
  - ahbwramc Jan 20, 2014, 3:34 AM
    2 points
    0
    Parent
    Me too, at least sort of—I just had a number stored in my brain that I associated with “Europe.” Turned out it was EU only, although I didn’t have any confusion about the question—I thought I was answering for all of Europe.
  - Nornagest Jan 20, 2014, 3:48 AM
    0 points
    0
    Parent
    I also interpreted Europe as EU, although I was about 20% off that as well.
- ArisKatsaris Jan 19, 2014, 4:00 PM
  1 point
  0
  Parent
  The misinterpretation of the survey’s meaning of “Europe” as “EU” is itself a failure as significant as wrongly estimating its population… so it’s not as if it excuses people who got it wrong and yet neither sought for clarification, nor took the possibility of misinterpretation into account when giving their confidence ratios...
  - Aleksander Jan 19, 2014, 4:28 PM
    11 points
    0
    Parent
    You might as well ask, “Who is the president of America?” and then follow up with, “Ha ha got you! America is a continent, you meant USA.”
    - ArisKatsaris Jan 19, 2014, 4:35 PM
      4 points
      0
      Parent
      I don’t think you’re making the argument that Yvain deliberately wanted to trick people into giving a wrong answer—so I really don’t see your analogy as illuminating anything.
      
      It was a question. People answered it wrongly whether by making a wrong estimation of the answer, or by making a wrong estimation of the meaning of the question. Both are failures—and why should we consider the latter failure as any less significant than the former?
      
      EDIT TO ADD: Mind you, reading the excel of the answers it seems I’m among the people who gave an answer in individuals when the question was asking number in millions. So it’s not as if I didn’t also have a failure in answering—and yet I do consider that one a less significant failure. Perhaps I’m just being hypocritical in this though.
      - KnaveOfAllTrades Jan 19, 2014, 8:28 PM
        0 points
        0
        Parent
        
        Perhaps I’m just being hypocritical in this though.
        
        Confirm. ;) (Nope, I didn’t misinterpret it as EU.)
        
        Even if people recognized the ambiguity, it’s not obvious that one should go for an intermediate answer rather than putting all one’s eggs in one basket by guessing which was meant. If I were taking the survey and saw that ambiguity, I’d probably be confused for a bit, then realize I was taking longer than I’d semi-committed to taking, answer make a snap judgement, and move on.
    - A1987dM Jan 20, 2014, 4:40 PM
      0 points
      0
      Parent
      The continent is basically never called just “America” in modern English (except in the phrases “North America” and “South America”), it’s “the Americas”.
  - William_Quixote Jan 19, 2014, 9:13 PM
    8 points
    0
    Parent
    Its also not obvious that people who went with the EU interpretation were incorrect. Language is contextual, if we were to parse the Times, Guardian, BBC, etc over the past year and see how the word “Europe” is actually used, it might be the land mass, or it might be the EU. Certainly one usage will have been more common than the other, but its not obvious to me which one it will have been.
    
    That said, if I had noticed the ambiguity and not auto parsed it as EU, I probably would have expected the typical American to use Europe as land mass and since I think Yvain is American that’s what I should have gone with.
    
    On the other other hand, the goal of the question is to gauge numerical calibration, not to gauge language parsing. If someone thought they were answering about the EU, and picked a 90% confidence interval that did in fact include the population of the EU that gives different information about the quantity we are trying to measure then if someone thinks Europe means the continent including Russia and picks a 90% confidence interval that does not include the population of the landmass. Remember this is not a quiz in school to see if someone gets “the right answer” this is a tool that’s intended to measure something.
    - Eugine_Nier Jan 20, 2014, 3:57 AM
      3 points
      0
      Parent
      Yvain explicitly said “Wikipedia’s Europe page”.
      - simplicio Jan 20, 2014, 1:56 PM
        5 points
        0
        Parent
        Which users could not double-check because they might see the population numbers.
        Eugine_Nier Jan 21, 2014, 3:59 AM
        8 points
        0
        Parent
        But they should expect the Wikipedia page to refer to the continent.