Probability and Politics
Follow-up to: Politics as Charity
Can we think well about courses of action with low probabilities of high payoffs?
Giving What We Can (GWWC), whose members pledge to donate a portion of their income to most efficiently help the global poor, says that evaluating spending on political advocacy is very hard:
Such changes could have enormous effects, but the cost-effectiveness of supporting them is very difficult to quantify as one needs to determine both the value of the effects and the degree to which your donation increases the probability of the change occurring. Each of these is very difficult to estimate and since the first is potentially very large and the second very small [1], it is very challenging to work out which scale will dominate.
This sequence attempts to actually work out a first approximation of an answer to this question, piece by piece. Last time, I discussed the evidence, especially from randomized experiments, that money spent on campaigning can elicit marginal votes quite cheaply. Today, I’ll present the state-of-the-art in estimating the chance that those votes will directly swing an election outcome.
Disclaimer
Politics is a mind-killer: tribal feelings readily degrade the analytical skill and impartiality of otherwise very sophisticated thinkers, and so discussion of politics (even in a descriptive empirical way, or in meta-level fashion) signals an increased probability of poor analysis. I am not a political partisan and am raising the subject primarily for its illustrative value in thinking about small probabilities of large payoffs.
Two routes from vote to policy: electing and affecting
In thinking about the effects of an additional vote on policy, we can distinguish between two ways to affect public policy: electing politicians disposed to implement certain policies, or affecting [2] the policies of existing and future officeholders who base their decisions on electoral statistics (including that marginal vote and its effects). Models of the probability of a marginal vote swaying an election are most obviously relevant to the electing approach, but the affecting route will also depend on such models, as they are used by politicians.
The surprising virtues of naive Fermi calculation
One objection comes from modeling each vote as a flip of a biased coin. If the coin is exactly fair, then the chance of a tie goes with 1/(sqrt(n)). But if the coin is even slightly removed from exact fairness, then the chance of a tie rapidly falls to neglible levels. This was actually one of the first models in the literature, and recapitulated by LessWrongers in comments last time.
However, if we instead think of the bias of the coin itself as sampled from a uniform distribution, then we get the same result as Schwitzgebel. In the electoral context, we can think of the coin’s bias as reflecting factors with correlated effects on many voters, e.g. the state of the economy, with good economic results favoring incumbents and their parties.
Fermi, meet data
How well does this hold up against empirical data? In two papers from 1998 and 2009, Andrew Gelman and coauthors attempt to estimate the probability a voter going into past U.S. Presidential elections should have assigned to casting a decisive vote. They use standard models that take inputs like party self-identification, economic growth, and incumbent approval ratings to predict electoral outcomes. These models have proven quite reliable in predicting candidate vote share and no more accurate methods are known. So we can take their output as a first approximation of the individual voter’s rational estimates [3].
… the 1952-1988 elections. For six of the elections, the probability is fairly independent of state size (slightly higher for the smallest states) and is near 1 in 10 million. For the other three elections (1964, 1972, and 1984, corresponding to the landslide victories of Johnson, Nixon, and Reagan [incumbents with favorable economic conditions]), the probability is much smaller, on the order of 1 in hundreds of millions for all of the states.
probabilities a week before the 2008 presidential election, using state-by-state election forecasts based on the latest polls. The states where a single vote was most likely to matter are New Mexico, Virginia, New Hampshire, and Colorado, where your vote had an approximate 1 in 10 million chance of determining the national election outcome. On average, a[n actual] voter in America had a 1 in 60 million chance of being decisive in the presidential election.
It is possible to make sensible estimates of the probability of at least some events that have never happened before, like tied presidential elections, and use them in attempting efficient philanthropy.
[1] At least for two-boxers. More on one-boxing decision theorists at a later date.
[2] There are a number of arguments that voters’ role in affecting policies is more important, e.g. in this Less Wrong post by Eliezer. More on this later.
[3] Although for very low values, the possibility that our models are fundamentally mistaken looms progressively larger. See Ord et al.
[4] Including other relevant sorts of competitiveness, e.g. California is typically a safe state in Presidential elections, but there are usually competitive ballot initiatives.
- A Voting Puzzle, Some Political Science, and a Nerd Failure Mode by 10 Oct 2013 2:10 UTC; 142 points) (
- Original Research on Less Wrong by 29 Oct 2012 22:50 UTC; 48 points) (
- Politics as Charity by 23 Sep 2010 5:33 UTC; 37 points) (
- 28 May 2015 5:32 UTC; 23 points) 's comment on Collective Action and Individual Impact by (EA Forum;
- Some Considerations Against Short-Term and/or Explicit Focus on Existential Risk Reduction by 27 Feb 2011 4:31 UTC; 12 points) (
- Tiny Probabilities of Vast Utilities: Bibliography and Appendix by 20 Nov 2018 17:34 UTC; 10 points) (EA Forum;
- 24 Nov 2010 17:03 UTC; 4 points) 's comment on What I’ve learned from Less Wrong by (
May I remind people about “Nader Trading”?
also, cross-posting from OB.
Is it potentially a good charity in a region where rule of law has essentially broken down to fund/promote the dominant/police/stationary bandit side in a tug of war against the non-dominant/mob/roving bandit side? Personally, to me US politics looks like a fight between stationary bandits and roving bandits, and permanent near-total defeat of the roving bandits seems like a prerequisite to the reestablishment of real economic growth.
I don’t see myself as partisan, as I’d be happy to support a party of the right OR the left so long as they could offer credible hope for the total destruction of the Republican party as they currently exist. Ironically, this makes me think that altruists should support Palin, as she seems to be the person with the best chance of doing that, and also seems utterly incapable of actually holding power herself, though as a charity, I still prefer SIAI over Palin’s primary campaign by many orders of magnitude.
An explanation.
I like Robin’s recent take on the question.
For every person that sincerely believes that the Flurb Party will change things for the better and donates $100 to it, there’s someone who believes that the Bleeg Party will change things for the better and donates $100 to it … so they cancel each other out. Allocating a bigger part of the economy to printing fliers and posters doesn’t seem like the best way to make the world a better place.
Is it good to prevent bad? If so, should I donate to Flurb simply because I hope to cancel out someone donating to Bleeg?
Only if Bleeg is truly so much worse than Flurb that the small tip in the chances of the election is smaller than the good your donation would make for a more worthwhile cause.
Also, advocating for partisan political donations in general in a context where the only effect of those donations is to tip the chances one way or the other is irrational (as opposed to advocating for donations to one side in particular, which could be reasonable if you’re certain enough that side is truly much better than the alternative).
As someone said in the comments to Robin’s post, the same goes for encouraging people in general to get out and vote.
Yes, you should. But another point is that you probably overestimate the chances that Flurb is good and Bleeg is bad. The magnitude as well.
“Strategic reallocation of political effort” and the additional factor of “strategic reallocation of voting that takes into account other people’s strategic reallocation of voting effort” seems both very complicated to calculate and likely to actually matter to what happens in real elections. I would expect quibbles with your conclusions in this area.
You have one sentence that handles the issue, but I’m not entirely sure how you handled it because your sentence involves two pathenthicals, two double negatives, and ambiguity inducing self reference to “this fact”. Here is the sentence:
Here is an attempted rewrite that I think restates the same thing with less ambiguity:
Assuming this re-writing captures the same basic idea, I think the issue of self-awareness induced ties can be analyzed in terms of the number of people who think of voting as “siding with a winning or losing side” versus “a costly duty to act in a publicly beneficial way”. Voters who think of voting as a costly duty seem potentially subject to self-awareness induced ties. Voters who side with predicted winners seem likely to push dynamics away from these sorts of ties.
This suggests small scale experiments and real world polling where voters are measured to see whether they vote according to one, both, or neither of these dynamics and the numbers who do so are used to refine election predictions.
The historical data already take into account the rough current distribution of such voters, and the efforts of national political organizations that try to put money into competitive races. If arguments like mine become more widespread in the future, they will change matters.
This post explicitly limits itself to causal decision theory to help avoid these issues, but I’ll discuss them in a future post on decision theory complications. The second parenthetical was an acknowledgment that there is more to say on it.
Experiments and studies like the ones you suggest do seem like they would be helpful in navigating those complications.
I don’t like the coin model because it ignores replacement.
Assume there’s ten other people in a room. Six like red and four like blue. Four of them will go to the polls, and you’re trying to decide if you should, too. What’s the probability your vote will be the deciding factor?
It’s tempting to use the binomial distribution. p=0.5, n=4. Your vote matters if x=2.
So it’ll be tied without you about 35% of the time.
But this is incorrect. If the first person who votes casts a red ballot, then the probability the next vote is red falls to 5⁄9, and the probability the next vote is blue increases to 4⁄9. The correct model is the Hypergeometric model because it doesn’t assume replacement.
It computes a higher 43%.
As n increases from 10 to 300000000, I imagine the effect is more dramatic.
Either way, with large electorates, the sampling error will be swamped (by orders of magnitude) by correlated changes across voters. For instance, the swings in voting behavior from economic conditions regularly move results by a number of percentage points.
Move relative to what? Last year’s results?
I was imagining getting the probabilities a single voter would vote for candidate X from Gallop.
I meant that that local stochastic things affecting individual voters are not important in the year-to-year variation in election outcomes, compared to systematic effects like the economy.
If you had an exact fraction of voters who would break for which candidate (which polling isn’t accurate enough to give), you still would face uncertainty about turnout.
The standard error of polling is usually pretty small.
Cool example. I’m still confused, though; why model our uncertainty about the electoral outcome as stemming form which folks will go to the polls (while assuming for simplicity that each person has fixed preferences), rather than as stemming from our uncertainty as to how a fixed set of voters will vote (while assuming for simplicity that the set of voters is fixed)?
ETA: Sorry, I edited this after it was replied to, without noticing the reply.
I assume the randomness comes from sampling error, not from uncertainty about who people will vote for. My parents will always vote for Republicans, but they don’t always participate.
Let me refocus on my point. I want to estimate the probability my vote will matter.
With population n, participation rate v, and pre-election polling showing r support for the policy, the probability your vote will matter is equal to:
(C[nv/2,nr]C[nv/2,n(1-r)])/C[n,nv]
The post compares taking roughly one hour to vote against using the hour to earn money and donating it to campaigns, on the basis of one vote versus an expected number of votes. But this ignores secondary effects of voting, such as communicating honestly or dishonestly with other voters, that may be more important than the vote itself.
Note that the analysis holds for a single rational voter.
If many people decide using similar considerations, then donations go up, electoral turnout falls, and extremists (who can’t be swayed by advertising or campaigning) and non-rationalists (who do not apply the OP’s analysis) will be over-represented. This is a distorting influence.
If donations go up then candidates suffer who attract relatively little funding (of the normal type, not the type of donations which rationalists use to replace voting). This is a second distorting influence.
A drop in electoral turnout can be seen as decreasing the winner’s perceived legitimacy. This might be an unintended consequence.
Yes. Carl’s post notes that he’ll assume CDT for this post, for simplicity, and will consider decision theories later.
But even if we go ahead and allow non-CDT complications: we’re considering elections here, and for elections, we have solid past data indicating how most people act. In such situations, even if one doesn’t assume CDT, reasoning on the present margin seems to be the correct thing to do. You know how many people, roughly, behave one way vs the other. It’s correct to ask about the benefits of moving the voters from [usual number] to [usual number + 1], or the campaign donations from [usual number] to [usual number + yours], and not to consider the rather different average changes that would be brought about in moving the current totals to a far-away and unlikely total.
For example, if I’m considering whether to be vegetarian or to donate to in vitro meat, I should ask about the benefits of one person doing so; the argument “but if everyone donated to in vitro meat, their ability to use money would be overwhelmed, and this would be less useful than everyone being vegetarian” is irrelevant.
The situation would be different if I was e.g. considering the action-shift in response to a national bestseller that advocated that action, or if I was otherwise being moved by considerations that might affect enough people to significantly change the margin, and, thus, the marginal impact.
Since we know that CDT is totally wrong in such situations, even if TDT/UDT doesn’t help with quantitative analysis, “for simplicity” doesn’t quite side-step the flaw.
We also know that frictionless planes are totally wrong in most situations. That doesn’t mean that assuming a frictionless plane “for simplicity” is not a reasonable first step when attempting a difficult analysis. As Polya teaches: when considering a problem that is too difficult, start with a similar problem related to your target.
Most people despair of calculating optimal philanthropy payoffs at all because the situation is so complicated. The result is a huge inefficiency of most philanthropic efforts. If we’re going to make headway, it will have to be by considering and expositing simple pieces and building up piece by piece, as Carl begins to do here.
If the problem is “continuous”, you’ll get sufficiently correct solution for sufficiently low-friction problems. In this sense the assumption of lack of friction is not “totally wrong” in the sense I used the term in my comment for CDT/TDT voting analysis differences.
I agree with this observation: you learn about the structure of methods of solving the target problem by studying similar methods of solving simpler problems, even if solutions (answers) are unrelated (not similar).
However, I don’t see how CDT analysis with its deciding votes is at all similar to TDT analysis that involves no such concept, and so how this observation is relevant.
It’s often a reasonable strategy, but not if the “pieces” have nothing to do with the desired whole.
Could you say more about how the TDT voting analysis would go, and what its pieces would be?
It seems to me that in the limit as the number of voters with “your algorithm” goes to zero, the TDT solution is the same as the CDT solution.
That’s the more interesting topic, and it came up when I visited the NYC LW crew last week.
My take is that, if TDT really is superior to other decision theories, then a society of majority-TDTers should not lose out to “mindless drone decision theorists” (MDDTers) simply by all individually refusing to vote, while the MDDTers vote for stupid policies in unision.
The TDTers would, rather, recognize the correlation between their decisions, and reason that their own decision, in the relevant sense, sets the output of the other TDTers, so they have to count the benefit of voting as being more than just “my favored policies +1 vote”. I conclude that a TDTer would decide to vote, reasoning something like “If I deem it optimal to vote, so do decision makers similar to me.”
The others there disagreed that TDTers would vote in such an instance, claiming that other methods of influencing the outcome exceed the effectiveness of voting in all situations.
This seems to suggest that a society of TDTers would quickly abandon democracy. What form of government would they move to?
Elaborate on your reasoning there.
Were you not talking about a society of TDTers that didn’t think it was worth voting? Or were you allowing for a sufficient number of irrational nuts in the system for the democratic process to be useful or necessary even though the majority (and all the rational people) do not use it?
Well, the particular scenario I had in mind was a democratic one (where the MDDTers believe in democracy), and the eligible TDTers could win every election if they (nearly) all voted, and where the MDDTers vote in unison for stupid policies. And the questions is whether the TDT algorithm outputs “vote”; their decision not to vote is not an assumption (though perhaps they agree that, at least per CDT rules, voting is pointless).
If you’re asking what the proposed non-voting TDT-compliant alternative is, and if it would involve keeping a democratic system, then I’ll say what I should have earlier: I don’t know—that’s something I was trying to find out from those who disagreed with me. One of them said that any amount of effort spent voting would be better spent propagandizing, so there is no margin where the TDTer deems voting optimal.
I was skeptical: once you accept that TDTers “naturally” make correlated decisions (in this type of problem), your vote “controls” something much more effective (the decision of a majority of voters). Then, even under generous assumptions about alternate uses of your voting effort, and aggregating this across all TDTers, and recognizing the mind-shields that various levels of drones put up, it’s not clear why propagandizing is better.
To the extent that the drones are maximally mindless, your propaganda does nothing to change their minds, either on the object level (this election) or meta level (which political system is best). To the extent that the drones are “reasonable”, a certain fraction of their votes will go toward the TDT-favored policies anyway, further reducing the threshold TDTers have to meet to get good policies.
That is approximately my thinking too.
I suppose this depends just how open minded the TDTers are when it comes to considering alternative ways to enforce their influence over policy in the case of pointless propaganda ;)
This analysis of consequences of your decisions doesn’t just say that other people who perform similar analysis are influenced by your decision. People who make their decisions differently can be (seen as) influenced as well.
I don’t know. I know that CDT commits irrecoverable error, but not how to understand the problem. (I can guess that my decision probably makes a difference of 0.01 to 20% in a two-choice vote of the typical kind, but this is not based on explicit analysis, hence wide interval.)
That I don’t know how to solve the problem doesn’t license me to privilege a “solution” that is known to be incorrect (even though it’s rigorous and popular).
Yes, but it’s an unreasonable assumption in case of voting, and I don’t see how to generalize in the direction of acausal-under-logical-uncertainty control from a solution performed under this assumption. From what I currently understand, the question is, what can you predict about all voters (how would you estimate the outcome), if you assume that you actually make a certain voting decision (estimate this for all possible decisions). Such assumption can even weakly inform you about probable decisions of other voters that are rather loosely related to you, with the estimated probability of voting by person X being controlled by your decision less if you are less similar to X, but with (your understanding of decisions of) all people controlled to some extent.