I don’t like the coin model because it ignores replacement.
Assume there’s ten other people in a room. Six like red and four like blue. Four of them will go to the polls, and you’re trying to decide if you should, too. What’s the probability your vote will be the deciding factor?
It’s tempting to use the binomial distribution. p=0.5, n=4. Your vote matters if x=2.
So it’ll be tied without you about 35% of the time.
But this is incorrect. If the first person who votes casts a red ballot, then the probability the next vote is red falls to 5⁄9, and the probability the next vote is blue increases to 4⁄9. The correct model is the Hypergeometric model because it doesn’t assume replacement.
Either way, with large electorates, the sampling error will be swamped (by orders of magnitude) by correlated changes across voters. For instance, the swings in voting behavior from economic conditions regularly move results by a number of percentage points.
I meant that that local stochastic things affecting individual voters are not important in the year-to-year variation in election outcomes, compared to systematic effects like the economy.
If you had an exact fraction of voters who would break for which candidate (which polling isn’t accurate enough to give), you still would face uncertainty about turnout.
Cool example. I’m still confused, though; why model our uncertainty about the electoral outcome as stemming form which folks will go to the polls (while assuming for simplicity that each person has fixed preferences), rather than as stemming from our uncertainty as to how a fixed set of voters will vote (while assuming for simplicity that the set of voters is fixed)?
ETA: Sorry, I edited this after it was replied to, without noticing the reply.
I assume the randomness comes from sampling error, not from uncertainty about who people will vote for. My parents will always vote for Republicans, but they don’t always participate.
I don’t like the coin model because it ignores replacement.
Assume there’s ten other people in a room. Six like red and four like blue. Four of them will go to the polls, and you’re trying to decide if you should, too. What’s the probability your vote will be the deciding factor?
It’s tempting to use the binomial distribution. p=0.5, n=4. Your vote matters if x=2.
So it’ll be tied without you about 35% of the time.
But this is incorrect. If the first person who votes casts a red ballot, then the probability the next vote is red falls to 5⁄9, and the probability the next vote is blue increases to 4⁄9. The correct model is the Hypergeometric model because it doesn’t assume replacement.
It computes a higher 43%.
As n increases from 10 to 300000000, I imagine the effect is more dramatic.
Either way, with large electorates, the sampling error will be swamped (by orders of magnitude) by correlated changes across voters. For instance, the swings in voting behavior from economic conditions regularly move results by a number of percentage points.
Move relative to what? Last year’s results?
I was imagining getting the probabilities a single voter would vote for candidate X from Gallop.
I meant that that local stochastic things affecting individual voters are not important in the year-to-year variation in election outcomes, compared to systematic effects like the economy.
If you had an exact fraction of voters who would break for which candidate (which polling isn’t accurate enough to give), you still would face uncertainty about turnout.
The standard error of polling is usually pretty small.
Cool example. I’m still confused, though; why model our uncertainty about the electoral outcome as stemming form which folks will go to the polls (while assuming for simplicity that each person has fixed preferences), rather than as stemming from our uncertainty as to how a fixed set of voters will vote (while assuming for simplicity that the set of voters is fixed)?
ETA: Sorry, I edited this after it was replied to, without noticing the reply.
I assume the randomness comes from sampling error, not from uncertainty about who people will vote for. My parents will always vote for Republicans, but they don’t always participate.
Let me refocus on my point. I want to estimate the probability my vote will matter.
With population n, participation rate v, and pre-election polling showing r support for the policy, the probability your vote will matter is equal to:
(C[nv/2,nr]C[nv/2,n(1-r)])/C[n,nv]