The Paradox of Voting, simply stated, is that voting in a large election almost certainly isn’t worth your time (unless you think it’s the most fun thing you could be doing). The guaranteed opportunity cost of going to vote will in most cases easily and predictably outweigh the expected benefits — the chance that your vote (along with everyone else’s) would be pivotal because the margin was 1 vote, multiplied by your expected marginal utility payoff from your chosen candidate winning.
There are various well-known responses to this issue, listed in the Wikipedia article linked above. But to me, one of the obvious responses is to see this as just another instance of a chicken/snowdrift game, and to invoke the logic you might use to support cooperation in such games; that is, decision theory. I think this may even be one of the most common real-world instances where UDT/FDT might apply. I think it would also be a source of interesting edge cases for exploring the limits of UDT/FDT; that is, even small changes in how strictly you delimit which other (potential) voters to consider as UDT/FDT “co-agents” could easily swing the prescriptions you’d get. But doing a few quick google searches doesn’t turn up any write-ups considering this issue in this light. Am I missing something, or is this idea really “new” (at least, undocumented)?
ETA: Thanks to @Vanessa Kosoy, @Daniel Kokotajlo, and @strangepoop, I now have sufficient references for prior discussions of this idea. Thanks! Honorable mention to @lkaxas, who suggested a connection to Kantian ethics which is relevant, though more remote than the references given by the above three.
There’s a whole section on voting in the LDT For Economists page on Arbital. Also see the one for analytic philosophers, which has a few other angles on voting.
From what I can tell from your other comments on this page, you might already have internalized all the relevant intuitions, but it might be useful anyway. Superrationality is also discussed.
Sidenote: I’m a little surprised no one else mentioned it already. Somehow arbital posts by Eliezer aren’t considered as canon as the sequences, maybe it’s the structure (rather than just the content)?
I think it’s just reachability. Arbital is Far Away, and it’s plausible that not everyone even knows it exists.
Gary Drescher in his old book Good and Real talks about this. p299. It was especially cool in that it said that even altruist CDTers can’t account for the rationality of voting in sufficiently large elections. I haven’t verified whether that’s true or not.
That’s pretty surprising. I checked out the page, and he unfortunately doesn’t motivate what kind of model he’s using, so it’s hard to verify. From the book:
In an election with two choices, in a model where everybody has 50% chance of voting for either side, I don’t think the claim is true. Maybe he’s assuming that the outcomes of elections become easier to predict as they grow larger, because individual variability becomes less important? If everyone has a 51% probability of voting for a certain side, the election would be pretty much guaranteed for an arbitrarily large population, in which case a CDTer wouldn’t have any reason to vote (even if there was a coalition of CDTers who could swing the election). I’m not sure if it’s true that elections in larger countries are more predictable, though.
I also think that in that case, the odds of a tie don’t decrease faster than linearly, but you need to take into account symmetry arguments and precision arguments. That is:
Suppose there are 2N other voters and everyone else votes by flipping a coin. Then the number of votes for side A will be binomially distributed with distribution (2N,0.5) with mean N, and the votes for side B will be 2N-A, and the net votes A—B will be 2A-2N, with an expected value of 0.
But how likely is it to be 0 exactly (i.e. a tie that you flip to a win)? Well, that’s the probability that A is N exactly, which is a decreasing function of N. Suppose N is 1,000 (i.e. there are 2,000 voters); then it’s 1.7%. Suppose it’s 1,000,000; then it’s 0.05%. But 1.7% divided by a thousand is less than 0.05%.
But from the perspective of everyone in the election, it’s not clear why ‘you chose last.’ Presumably everyone on the side with one extra vote would think “aha, it would have been a tied election if I hadn’t voted,” and splitting that up gives us our linear factor.
As well, this hinged on the probability being 0.5 exactly. If instead it was 50.1% favor for A, the odds of a tie are basically unchanged for the 2,000 voter election (we’ve only shifted the number of expected A voters by 2), but drop to 1e-5 for the 2M voter election, a drop by a factor larger than a thousand. (The expected number of net A voters is now 2,000, which is a much higher barrier to overcome by chance.)
However, symmetry doesn’t help us here. Suppose you have a distribution over the ‘bias’ of the coin the other voters are flipping; a tie is just as unlikely if A is favored as if B is favored, and the more spread out our distribution over the bias is, the worse the odds of a tie are, because for large elections only biases very close to p=0.5 contribute any meaningful chance of a tie.
Consider a 2-option election, with 2N voters, each of whom has probability p of choosing the first option. If p is a fixed number, then as N goes to infinity, (chances of an exact tie times N) go to 0 if N isn’t exactly .5, and to infinity if it is. Since the event of p is exactly .5 has measure 0, this model supports the paradox of voting (PoV).
But! If p itself is drawn from an ordinary continuous distribution with nonzero probability density d around .5, then (chances of an exact tie times N) go to … I think it’s just d/2. Maybe there’s some correction factor that comes into play for bizarre distributions of p, but if we make the conventional assumption that it’s beta-distributed, then d/2 is the answer.
I think that the PoV literature is relying on the “fixed p” model. I think the “uncertain p” model is more realistic, but it’s still worth engaging with “fixed p” and seeing the implications of those assumptions.
As an aside, for really large populations, it would probably be socially optimal to only have a small fraction of the population voting (at least if we ignore things like legitimacy, feeling of participation, etc). As long as that fraction is randomly sampled, you could get good statistical guarantees that the outcome of the election would be the same as if everyone voted. South Korea did a pretty cool experiment where they exposed a representative sample of 500 people to pro- and anti-nuclear experts, and then let them decide how much nuclear power the country should have.
I don’t think this is why CDTs refuses to vote, though.
That book is from 2006. I understand that it deals with the Paradox of Voting, but does it have anything that would be directly relevant to considering it in light of “acausal decision theories”? As far as I know, such theories pretty much didn’t exist back then.
Drescher coined the term “acausal” in the context of decision theory, in Good and Real. His arguments and ideas are remarkably similar to things Yudkowsky and others on LessWrong have said in the decade or so since. One of my side projects (which I keep putting off) is to explore his proposed decision theory (which differs from CDT and EDT and, notably, one-boxes even in Transparent Newcomb!) in more detail, to see how it compares to stuff LessWrong talks about.
This idea is certainly not new, for example in an essay about TDT from 2009, Yudkowsky wrote:
(emphasis mine)
The relevance of TDT/UDT/FDT to voting surfaced in discussions many times, but possibly nobody wrote a detailed essay on the subject.
I don’t think any of the more interesting decision theories differ from CDT on a trivial expected value calculation, with no acausal paths to the payoffs. How do you see it working? Can you put some probabilities and payoffs in place to show why you think this is relevant?
But there is an obvious acausal path in this case. If other voters are using the same algorithm you are to decide whether or not to vote, or a “sufficiently similar” one (in some sense that would have to be fleshed out), then that inflates the probability that “your” decision of whether or not to vote is pivotal, because “you” are effectively multiple voters.
Is that sufficient, or do you need actual numbers? (I’d guess it is and you don’t.)
I guess it is, but I’d edit your question to mention that you include https://en.wikipedia.org/wiki/Superrationality in your assumptions. Personally, I don’t think that other potential voters are all that similar to myself, so all decision theories lead to the same result (negative EV for voting, when considering only cost of time spent vs chance of pivotal outcome).
I very much do not include superrationality in my assumptions. I’m not assuming that all other voters, or even any specific individual other voter, is explicitly using a meta-rational decision theory; I’m simply allowing the possibility that the “expected acausal impact” of my decision is greater than 0 other voters. There are, I believe, a number of ways this could be “true”.
In simpler terms: I think that my beliefs (and definitions) about whether (how many) other voters are “like” me are orders of magnitude different from yours, in a way that is probably not empirically resolvable. I understand that taking your point of view as a given would make my original question relatively trivial, but I hope you understand that it is still an interesting question from my point of view, and that exploring it in that sense might even lead to productive insights that generalize over to your point of view (even though we’d probably still disagree about voting).
If you like, I guess, we could discuss this in a hypothetical world with a substantial number of superrational voters. For you this would merely be a hypothetical, which I think would be interesting for its own sake. For me, this would be a hypothetical special case of acausal links between voters, links which I believe do exist though not in that specific form.
Fair enough. That lack of empirical validation is a hallmark of esoteric decision theories, so is some evidence you’re on an interesting track :)