RobinZ comments on Less Wrong Polls in Comments

RobinZ 26 Sep 2012 14:09 UTC
2 points
Question: what’s a reasonable prior over the probability distribution of poll answers? Because I downloaded the raw data, and it says:
1. 15
2. 22
3. 21
4. 24
5. 18
...and I’m not sure what would constitute reasonable priors for the uniform distribution hypothesis versus the “aversion toward First Answer” hypothesis versus the “aversion toward First Answer and Fifth Answer” hypothesis.
- Kindly 26 Sep 2012 16:32 UTC
  6 points
  Parent
  My own feelings on the matter are that if you don’t know what prior to have, compute worst-case bounds.
  
  In this case, the model that maximizes the probability of seeing this data is that each answer is 15% likely to be 1, 22% likely to be 2, 21% likely to be 3, 24% likely to be 4, and 18% likely to be 5. We can compute the probability of seeing this data under this model, and also under the “all answers are equally likely” model, and conclude that our worst-case model makes us only 3.61 times as likely to see this data.
  
  In particular, any other hypothesis you might have can only receive this little evidence, relative to the uniform distribution hypothesis; and I believe in close-to-uniformity enough that I’m not going to be swayed by what is fewer than 2 bits of evidence.
  - RobinZ 27 Sep 2012 2:16 UTC
    2 points
    Parent
    Thanks! I didn’t think of that particular brainhack—I’ll be sure to use it in the future.
- othercriteria 30 Sep 2012 15:34 UTC
  5 points
  Parent
  Your question is confused. The uniform distribution hypothesis only requires that the (assumed infinite) population picks the answers independently with equal probability. Under this hypothesis, the observed poll answers (for a fixed number of respondents) will follow a multinomial distribution with parameters (0.2, 0.2, 0.2, 0.2, 0.2). A typical realization will not have an equal number of respondents giving each answer, although asymptotically the empirical frequencies will converge to equality.
  
  Anyways, as a Bayesian, the better question is what should my posterior belief about the response probabilities be after running the poll and updating off the answers? The canonical way to do this would be to put a Dirichlet prior over the response probabilities. By the miracle of conjugacy, your posterior distribution will itself by a (generally different) Dirichlet distribution.
  
  By taking the expectation of indicator variables like I{”probability of First Answer under 0.2″} under the posterior, you can figure out what degree of belief you must give to statements like “respondents have an aversion toward First Answer”.
  - RobinZ 30 Sep 2012 15:40 UTC
    4 points
    Parent
    That makes sense—I had imagined doing something similar, but I had never heard of Dirichlet priors.
    - othercriteria 30 Sep 2012 16:00 UTC
      3 points
      Parent
      Happy this helped. The Dirichlet-multinomial model gets relatively little attention because it adds nothing really new to the beta-binomial model for polls with just two responses. It’s easy to find lots of introductory, chatty introductions to the beta-binomial like this one or this one if you want to learn more...