An issue with MacAskill’s Evidentialist’s Wager
In The Evidentialist’s Wager, MacAskill et al. argue as follows:
Suppose you are uncertain as to whether EDT or CDT is the correct decision theory, and face a Newcomb-like decision. If CDT were correct, your decision would only influence the outcomes of your present decision. If EDT were correct, it would provide evidence not only for the outcome of your present decision, but also for the outcome of many similar decisions by similar agents throughout the universe (either because they are exact copies of you, or they have very similar decision theories/computations to you, etc.). Thus, the stakes for the decision are way higher if EDT is true, and so you should act as if EDT were true (even if you have higher prior credence on CDT).
This argument of course relies on how many of these similar agents actually exist, and how similar they are. They use the term correlated agent to mean some agent similar enough to you so that your decision will acausally provide evidence about theirs. As possible counterexamples to their argument, they point out the existence of different agents:
Anti-correlated agents: Agents whose decision theory will drive them to take the decision opposite to yours.
Evil Twins: Agents positively correlated to you (with your same decision theory) but with drastically different utility functions (or in the extreme, the exact opposite utility function).[1]
Regarding anti-correlated agents, since our decision theories are actual good heuristics for going about the world and rationally obtaining our goals, it seems more likely for agents that exist (that is, have survived) to be positively correlated rather than anti-correlated.
But regarding Evil Twins, because of the Orthogonality Thesis, we might expect on average that there are as much agents positively correlated to us with ~our same utility function (Good Twins) as agents positively correlated to us with ~opposite utility function (Evil Twins).
That is, the universe selects for agents positively correlated to us (instead of anti-correlated), but doesn’t select for agents with ~our utility function (instead of the ~opposite function).
So we should expect the acausal evidence from all these other agents to balance out, and we’re back at EDT having stakes just as high as CDT: only our particular decision is affected.[2]
Possible objection
Of course, this refutation relies on our actually expecting that there are as many Evil Twins as Good Twins (or our being in an epistemic position as to have equal credence for there being more Evil Twins and there being more Good Twins). It’s not true that the Orthogonality Thesis directly implies the first thing: it might perfectly be that any “intelligence level” is compatible with any utility function in mindspace, but still that the universe does select for a certain type of utility function. In fact, it seems likely this happens (at least subtly) in some ways we yet don’t understand. But it seems very unlikely (to me) that the universe should for some reason select for ~our particular utility function.
This intuition is mainly fueled by the consideration of digital minds, and more concretely by how a superintelligent agent maximizing its utility will most likely do horrible things (according to ours)[3].
The actual core of the disagreement
MacAskill et al. sweep all of this under the rug by invoking “the vast number of people who may exist in the future”. On my reading, they’re not saying “and all of these people will have ~our utility function”, which would be naïve (not only because of future human misalignment, which is not a big worry for MacAskill, but also because of aliens and superintelligences). On my reading, they’re saying “and this will tip the balance either way, towards a majority of these agents maximizing either ~our utility function or ~its opposite, which will make the stakes for EDT higher than those of CDT”.
That is: Regardless of whether we know which agents do and don’t exist, it’s very unlikely that there’s exactly the same amount of Good Twins and Evil Twins (except in certain infinite universes). Almost certainly the balance will be tipped in some direction, even if by pure contingent luck.
From this, they trivially conclude that EDT will have higher stakes than CDT: if there are more Good Twins (Evil Twins), EDT will recommend one-boxing (two-boxing) very strongly, since this will provide evidence to you about many agents doing the same. But I’m not satisfied with this answer, because if you don’t know whether more Good Twins or Evil Twins exist, you won’t be obtaining that evidence (upon taking the decision)!
That is, EDT is naturally interpreted from the agent’s perspective (and Bayes net). And so, knowing that there is a fact of the matter as to whether there are more Good Twins or Evil Twins (which an omniscient agent would be able to conclude from a bird’s eye view) doesn’t affect the agent’s Bayes net, and doesn’t increase EDT’s stakes for her.
If we actually had some evidence privileging one of the two options (for instance, because of Orthogonality failing), then EDT would certainly imply higher stakes and point us in that direction. But if we have absolutely no evidence as to which alternative is the case (or even such small evidence that it’s outweighed by our prior credence in CDT), then EDT will provide no higher stakes. I do think that’s the case for now.
- ^
I don’t know whether the use of the term Anti-correlated in the article fits this definition, or also includes Evil Twins, or even includes all agents for which your decision will provide evidence that they decided against your utility function. These details and other considerations feel glossed over in the article, and that’s my main point.
- ^
This argument is very similar to the usual refutation of Pascal’s Wager.
- ^
Here’s one reason why this might not be that big of a problem. For one such superintelligence to face a Newcomb-like decision, another agent must be intelligent enough to accurately predict it (“supersuperintelligent”). Of course, this doesn’t necessarily imply that the first agent won’t still be taking many Newcomb-like decisions with consequences we find abhorrent. But it might be that, instead of a complex structure of leveled agents cooperating or conflicting with each other, there’s just an agent “at the top of the intelligence chain” (superintelligent) which controls most of its lightcone. This agent wouldn’t face any Newcomb-like decisions. As objections, it might still partake in other complex reasoning acausally connected to our EDT reasoning, but that’s not obvious. Or it might deploy many lower intelligence (sub)agents who do face Newcomb-like decisions (if Alignment is solvable).
More generally, it might be that intelligent enough agents discover a more refined decision theory and thus aren’t acausally connected to our EDT. But this seems unlikely, given the apparent mathematical canonicity and rational usefulness of our theories and Newcomb-like problems.
I don’t think this is a situation of evidential symmetry which would warrant a uniform distribution (i.e. you can’t just say that “you don’t know”). (Moreover, there does not seem to be an overwhelmingly natural partition of the state space in this particular case, which arguably makes the Principle of Indifference inapplicable regardless—see section 3 of Greaves [2016].)
One weak piece of evidence is e.g. provided by the mediocrity principle: since I know for sure that there exists at least one agent who has my preferences and makes decisions in the way I do (me!)—and I don’t know the opposite for sure—I should expect there to be more Good Twins than Evil Twins.
Moreover, I should probably expect there in general to be some correlation between decision theory and values, meaning that my (decision-theoretic) twins are by my lights more likely to be Good than Evil.
Thank you for your comment, Sylvester!
As it turns out, you’re right! Yesterday I discussed this issue with Caspar Oesterheld (one of the authors). Indeed, his answer to this objection is that they believe there probably are more positively than negatively correlated agents. Some arguments for that are evolutionary pressures and the correlation between decision theory and values you mention. In this post, I was implicitly relying on digital minds being crazy enough as for a big fraction of them to be negatively correlated to us. This could plausibly be the case in extortion/malevolent actors scenarios, but I don’t have any arguments for that being probable enough.
In fact, I had already come up with a different objection to my argument. And the concept of negatively correlated agents is generally problematic for other reasons. I’ll write another post presenting these and other considerations when I have the time (probably the end of this month). I’ll also go over Greaves [2016], thank you for that resource!
Ah, nice. I was just about to recommend sections 2.6.2 and 3 of Multiverse-wide Cooperation via Correlated Decision Making by Caspar.
Nice, thank you! I will delve into that one as well when I have the time :-)
On acausal trade, an important point is that a lot of evidence is emerging that the universe is in fact infinitely large, which validates your thesis even more (and by extension, make’s MacAskill’s arguments nonsense.)
The authors consider the infinite case in section 5 of the paper. They conclude:
Why does accepting acausal trade (or EDT) provide evidence about an infinite universe? Could you elaborate on that? And of course, not all kinds of infinite universes imply there’s the same amount of Good Twins and Evil Twins.
I’m talking about the evidence from the Cosmic Microwave Background radiation, as well as a little bit of dark energy, which is converging on a picture of a universe that is flat, infinite and homogeneous, that is at the large scale there are no imbalances in the distribution of mass.
There’s already a lot of evidence from physics to bear on this question, which is what I’m talking about.
Oh, of course, I see! I had understood you meant acausal trade was the source of this evidence. Thanks for your clarification!