Question: what’s a reasonable prior over the probability distribution of poll answers? Because I downloaded the raw data, and it says:
15
22
21
24
18
...and I’m not sure what would constitute reasonable priors for the uniform distribution hypothesis versus the “aversion toward First Answer” hypothesis versus the “aversion toward First Answer and Fifth Answer” hypothesis.
My own feelings on the matter are that if you don’t know what prior to have, compute worst-case bounds.
In this case, the model that maximizes the probability of seeing this data is that each answer is 15% likely to be 1, 22% likely to be 2, 21% likely to be 3, 24% likely to be 4, and 18% likely to be 5. We can compute the probability of seeing this data under this model, and also under the “all answers are equally likely” model, and conclude that our worst-case model makes us only 3.61 times as likely to see this data.
In particular, any other hypothesis you might have can only receive this little evidence, relative to the uniform distribution hypothesis; and I believe in close-to-uniformity enough that I’m not going to be swayed by what is fewer than 2 bits of evidence.
Your question is confused. The uniform distribution hypothesis only requires that the (assumed infinite) population picks the answers independently with equal probability. Under this hypothesis, the observed poll answers (for a fixed number of respondents) will follow a multinomial distribution with parameters (0.2, 0.2, 0.2, 0.2, 0.2). A typical realization will not have an equal number of respondents giving each answer, although asymptotically the empirical frequencies will converge to equality.
Anyways, as a Bayesian, the better question is what should my posterior belief about the response probabilities be after running the poll and updating off the answers? The canonical way to do this would be to put a Dirichlet prior over the response probabilities. By the miracle of conjugacy, your posterior distribution will itself by a (generally different) Dirichlet distribution.
By taking the expectation of indicator variables like I{”probability of First Answer under 0.2″} under the posterior, you can figure out what degree of belief you must give to statements like “respondents have an aversion toward First Answer”.
Happy this helped. The Dirichlet-multinomial model gets relatively little attention because it adds nothing really new to the beta-binomial model for polls with just two responses. It’s easy to find lots of introductory, chatty introductions to the beta-binomial like this one or this one if you want to learn more...
Question: what’s a reasonable prior over the probability distribution of poll answers? Because I downloaded the raw data, and it says:
15
22
21
24
18
...and I’m not sure what would constitute reasonable priors for the uniform distribution hypothesis versus the “aversion toward First Answer” hypothesis versus the “aversion toward First Answer and Fifth Answer” hypothesis.
My own feelings on the matter are that if you don’t know what prior to have, compute worst-case bounds.
In this case, the model that maximizes the probability of seeing this data is that each answer is 15% likely to be 1, 22% likely to be 2, 21% likely to be 3, 24% likely to be 4, and 18% likely to be 5. We can compute the probability of seeing this data under this model, and also under the “all answers are equally likely” model, and conclude that our worst-case model makes us only 3.61 times as likely to see this data.
In particular, any other hypothesis you might have can only receive this little evidence, relative to the uniform distribution hypothesis; and I believe in close-to-uniformity enough that I’m not going to be swayed by what is fewer than 2 bits of evidence.
Thanks! I didn’t think of that particular brainhack—I’ll be sure to use it in the future.
Your question is confused. The uniform distribution hypothesis only requires that the (assumed infinite) population picks the answers independently with equal probability. Under this hypothesis, the observed poll answers (for a fixed number of respondents) will follow a multinomial distribution with parameters (0.2, 0.2, 0.2, 0.2, 0.2). A typical realization will not have an equal number of respondents giving each answer, although asymptotically the empirical frequencies will converge to equality.
Anyways, as a Bayesian, the better question is what should my posterior belief about the response probabilities be after running the poll and updating off the answers? The canonical way to do this would be to put a Dirichlet prior over the response probabilities. By the miracle of conjugacy, your posterior distribution will itself by a (generally different) Dirichlet distribution.
By taking the expectation of indicator variables like I{”probability of First Answer under 0.2″} under the posterior, you can figure out what degree of belief you must give to statements like “respondents have an aversion toward First Answer”.
That makes sense—I had imagined doing something similar, but I had never heard of Dirichlet priors.
Happy this helped. The Dirichlet-multinomial model gets relatively little attention because it adds nothing really new to the beta-binomial model for polls with just two responses. It’s easy to find lots of introductory, chatty introductions to the beta-binomial like this one or this one if you want to learn more...