Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner’s Dilemma
Summary
It has been argued that, if two very similar agents follow decision theories allowing for superrationality (e.g., EDT and FDT), they would cooperate in a prisoner’s dilemma (PD) (see e.g., Oesterheld 2017). But how similar do they need to be exactly? In what way? This post is an attempt at addressing these questions. This is, I believe, particularly relevant to the work of the Center on Long-Term Risk on acausal reasoning and the foundations of rational agency (see section 7 of their research agenda).
I’d be very interested in critics/comments/feedback. This is the main reason why I’m posting this here. :)
Normal PD
Consider this traditional PD between two agents:
Alice/Bob | C | D |
C | 3, 3 | 0, 5 |
D | 5, 0 | 1, 1 |
We can compute the expected payoffs of Alice and Bob (and ) as a function of (the probability that Alice plays C) and (the probability that Bob plays C):
Now, Alice wants to find (the optimal , i.e., the p that will maximize her payoff). Symmetrically, Bob wants to find . They do some quick math and find that =0, i.e., they should both play D. This is the unique Nash equilibrium of this game.
Perfect-copy PD
Now, say Alice and Bob are perfect copies. How does it change the game presented above? We still have the same payoffs:
However, this time, . Whatever one does, that’s evidence that the other does the exact same. They are decision-entangled[1].
What does that mean for the payoff functions of Alice and Bob? Well, decision theorists disagree. Let’s see what the two most popular decision theories (CDT and EDT) say, according to my (naive?) understanding:
EDT: “Alice should substitute q for p and her formula. Symmetrically, Bob should do the exact opposite in his”.
CDT: “Alice should hold q fixed. Same for Bob and p. They should behave as if they could change their action unilaterally through some kind of magic.” Therefore, CDT computes the dominant strategy from the original payoffs, ignoring the fact that p=q.
For CDT, just like in the normal PD above. For EDT, however, we now get (Alice and Bob should both cooperate). EDT is one of the decision theories that allow for superrationality: cooperation via entangled decision-making (Hofstadter 1983), or basically “factoring in the possibility that ”, as I understand it. So the difference between the Normal PD and the Perfect-copy PD matters only if both players have at least some credence in superrationality.
Formalizing the Conditions for Superrationality-motivated Cooperation in a one-shot PD
Given the above, we can hypothesize that Alice will (superrationally) cooperate with Bob in a one-shot PD iff:
She has a significant[2] credence in the possibility that they’re playing a Perfect-copy PD – as opposed to a Normal PD – (i.e., that they are decision-entangled), and
She has a significant credence in superrationality, such that she takes into account this decision-entanglement when she does the math. (This is assuming Alice might have decision-theoretic uncertainty.)
We then get those two neat conditions for cooperation:
Significant credence in decision-entanglement
Significant credence in superrationality
But what makes two agents decision-entangled?
Conditions for decision-entanglement
How does/should Alice form her credence in decision-entanglement? What are the required elements for two agents to have entangled decisions in a particular game?
First of all, you obviously need them to have compatible decision theories (DTs)[3]. Here’s (I think) a somehow representative instance of what happens if you don’t:
Now, replace Hilary with some EDT players, such that the compatible DTs condition is met. Does that mean the players have entangled decisions? No! Here’s an example proving that this doesn’t suffice:
Although they both follow EDT, their beliefs regarding decision-entanglement diverge. In addition to “I believe we have compatible DTs”, Arif thinks there are other requirements that are not met, here.
To identify what those requirements are, it is important to clarify what outputs the players’ beliefs: their epistemic algorithms[4] (which themselves take some pieces of evidence as inputs).
It then becomes clear what the requirements are besides “I believe we have compatible DTs” for Arif to believe there is decision-entanglement:
“I believe we have entangled epistemic algorithms (or that there is epistemic-entanglement[5], for short)”, and
“I believe we have been exposed to compatible pieces of evidence”.
Since rational Arif doesn’t believe he’s decision-entangled with John, that means he must think that at least one of the two latter statements is false.[6]
Now, what is the evidence John and Arif should be looking for?
First, they want to compare their DTs to see if they’re compatible, as well as their epistemics to see if they’re entangled.
Then, if they have compatible DTs and entangled epistemics, they also need common knowledge of that fact, which means that they need to somehow check whether they have been exposed to compatible evidence regarding those two things, and to check that they have been exposed to compatible evidence regarding their exposure to evidence, and so on ad infinitum.[7] If they don’t verify all of this, they would end up with non-entangled beliefs and non-entangled decisions.So here is how, I tentatively think, one-shot-PD players should reason:
Recall our conditions for superrationality-motivated cooperation in a one-shot PD:
Significant credence in decision-entanglement
Significant credence in superrationality
Assuming God doesn’t tell Alice whether her counterpart is decision-entangled with her, Alice would have a significant credence regarding #1 iff:
Significant credence in compatible DTs
Significant credence in epistemic-entanglement
Significant credence in the possibility that they have been exposed to some compatible pieces of evidence
Therefore, (again, assuming God doesn’t tell her whether her counterpart is decision-entangled with her) Alice would cooperate iff she has:
1. Significant credence in decision-entanglement
1.1 Significant credence in compatible DTs
1.2 Significant credence in epistemic-entanglement
1.3 Significant credence in the possibility that they have been exposed to some compatible pieces of evidence
2. Significant credence in superrationality
Remaining open questions
In our Normal PD and Perfect-copy PD games, we took two extreme examples where the credences were maximally low and maximally high, respectively. But what if Alice has uncertain beliefs when it comes to these conditions? What should she do?
For what it’s worth, the Appendix addresses the case where Alice is uncertain about #1 (without specifying credences about 1.1, 1.2, 1.3, though).
Alice now knows (thanks to me; you’re welcome Alice) that, in order to estimate the probability that she’s decision-entangled with Bob, she should factor the probability of i) Bob also being superrational, ii) Bob and she being epistemic-entangled, and iii) Bob and she having been exposed to compatible pieces of evidence. Coming up with a credence regarding i) doesn’t seem insuperable. The distinction between DTs that allow for superrationality and those that don’t is pretty clear. Coming up with a credence regarding ii) and iii), however, seems much more challenging. How would she do that? Where should she even look at? What about the infinite recursion when looking for relevant pieces of evidence?
Acknowledgments
Thanks to Sylvester Kollin and Nicolas Macé for fruitful discussions, as well as for benevolently teaching me some of the maths/game theory I used (mainly in the Appendix).
Thanks to Caspar Oesterheld, Johannes Treutlein, Lukas Gloor, Matīss Apinis, and Antonin Broi for very helpful feedback, suggestions, and discussions. Credits to Johannes Treutlein and Oscar Delaney for spotting a few crucial math and/or notation errors in earlier drafts.
Most of the work put into this post has been funded by CERI (now ERA) through their summer research fellowship. I’ve also benefited quite a lot from being welcome to work from the office of the Center on Long-Term Risk. I’m grateful to those two organizations, to their respective teams, as well as to all their summer research fellows with whom I had a very nice and productive time.
All assumptions/claims are my own. No organization or individual other than me is responsible for my potential inaccuracies, mistakes, or omissions.
Appendix: What if Alice is uncertain whether she and Bob are decision-entangled?
- ^
A few clarifications on this notion of decision-entanglement and my use of it:
- I am, here, assuming that the presence of decision-entanglement is an objective fact about the world, i.e., that there is something that does (or doesn’t) make the decisions of two agents entangled, and that it is not up to our interpretation (this doesn’t mean that decision-entanglement doesn’t heavily rely on the subjectivity of the two agents). This assumption is non-obvious and controversial. However, I am using this “entanglement realist” framework all along the post, and think the takeaways would be the same if I was adopting an “anti-realist” view. This is the reason why I don’t wanna bother thinking too much about this “entanglement (anti-)realism” thing. It doesn’t seem useful. Nonetheless, please let me know if you think my framework leads me to conclusions that are peculiar to it, such that they would be more questionable.
- Note that, although we took an example with perfect copies here, two agents do not need to have entangled decisions in absolutely every possible situation, in order to be (relevantly) decision-entangled. We only care about the decision they make in the PD presented here, so they could as well be imperfect copies and make unentangled decisions in other situations.
- Unless specified otherwise, I assume decision-entanglement with regard to one decision to be something binary (on a given problem, the decisions of two agents are entangled or they aren’t; no in between), for the sake of simplicity.
- ^
As demonstrated in the Appendix, what “significant” exactly means depends on the payoffs of the game. This applies to every time I use that term in this post.
- ^
By “compatible”, I mostly mean something like “similar”, although it’s sort of arbitrary what counts as “similar” or not (e.g., Alice and Bob could have two DTs that seem widely different from our perspective, although they’re compatible in the sense that they both allow or don’t allow for superrationality).
- ^
Thanks to Sylvester Kollin for suggesting to me to clearly differentiate between decision and epistemic algorithms in such games.
- ^
John and Arif are epistemically entangled iff 1) in the particular situation they’re in, their epistemic algos output similar results, given similar inputs, and 2) in the particular situation they’re in, they can’t unilaterally modify their epistemic algos.
- ^
- ^
Thanks to Caspar Oesterheld for informing me that the infinite recursion I was gesturing at was known as “common knowledge” in game theory.
- 12 Mar 2023 10:56 UTC; 3 points) 's comment on Acausal normalcy by (EA Forum;
I guess we have talked about this a bunch last year, but since the post has come up again...
I still don’t understand why it’s necessary to talk about epistemic algorithms and their entanglement as opposed to just talking about the beliefs that you happen to have (as would be normal in decision and game theory theory).
Say Alice has epistemic algorithm A with inputs x that gives rise to beliefs b and Bob has a completely different [ETA: epistemic] algorithm A’ with completely different inputs x’ that happens to give rise to beliefs b as well. Alice and Bob both use decision algorithm D to make decisions. Part of b is the belief that Alice and Bob have the same beliefs and the same decision algorithm. It seems that Alice and Bob should cooperate. (If D is EDT/FDT/..., they will cooperate.) So it seems that the whole A,x,A’,x’ stuff just doesn’t matter for what they should do. It only matters what their beliefs are. My sense from the post and past discussions is that you disagree with this perspective and that I don’t understand why.
(Of course, you can talk about how in practice, arriving at the right kind of b will typically require having similar A, A’ and similar x, x’.)
(Of course, you need to have some requirement to the extent that Alice can’t modify her beliefs in such a way that she defects but that she doesn’t (non-causally) make it much more likely that Bob also defects. But I view this as an assumption about decision-theoretic not epistemic entanglement: I don’t see why an epistemic algorithm (in the usual sense of the word) would make such self-modifications.)
Oh nice, thanks for this! I think I now see much more clearly why we’re both confused about what the other thinks.
(I’ll respond using my definitions/framing which you don’t share, so you might find this confusing, but hopefully, you’ll understand what I mean and agree although you would frame/explain things very differently.)
Say Bob is CooperateBot. Alice may believe she’s decision-entangled with them, in which case she (subjectively) should cooperate, but that doesn’t mean that their decisions are logically dependent (i.e., that her belief is warranted). If Alice changes her decision and defects, Bob’s decision remains the same. So unless Alice is also a CooperateBot, her belief b (“my decision and Bob’s are logically dependent / entangled such that I must cooperate”) is wrong. There is no decision-entanglement. Just “coincidental” mutual cooperation. You can still argue that Alice should cooperate given that she believes b of course, but b is false. If only she could realize that, she would stop naively cooperating and get a higher payoff.
It matters what their beliefs are to know what they will do, but two agents believing their decisions are logically dependent doesn’t magically create logical dependency.
If I play a one-shot PD against you and we both believe we should cooperate, that doesn’t mean that we necessarily both defect in a counterfactual scenario where one of us believes they should defect (i.e., that doesn’t mean there is decision-entanglement / logical dependency, i.e., that doesn’t mean that our belief that we should cooperate is warranted, i.e., that doesn’t mean that we’re not two suckers cooperating for wrong reasons while we could be exploiting the other and avoid being exploited). And whether we necessarily both defect in a counterfactual scenario where one of us believes they should defect (i.e., whether we are decision-entangled) depends on how we came to our beliefs that our decisions are logically dependent and that we must cooperate (as illustrated—in a certain way—in my above figures).
After reading that, I’m really starting to think that we (at least mostly) agree but that we just use incompatible framings/definitions to explain things.
Fwiw, while I see how my framing can seem unnecessarily confusing, I think yours is usually used/interpreted oversimplistically (by you but also and especially by others) and is therefore extremely conducive to Motte-and-bailey fallacies[1] leading us to widely underestimate the fragility of decision-entanglement. I might be confused though, of course.
Thanks a lot for your comment! I think I understand you much better now and it helped me reclarify things in my mind. :)
E.g., it’s easy to argue that widely different agents may converge on the exact same DT, but not if you include intricacies like the one in your last paragraph.
Nice, I think I followed this post (though how this fits in with questions that matter is mainly only clear to me from earlier discussions).
I think something can’t be both neat and so vague as to use a word like ‘significant’.
In the EDT section of Perfect-copy PD, you replace some p’s with q’s and vice versa, but not all, is there a principled reason for this? Maybe it is just a mistake and it should be U_Alice(p)=4p-pp-p+1=1+3p-p^2 and U_Bob(q) = 4q-qq-q+1 = 1+3q-q^2.
I am unconvinced of the utility of the concept of compatible decision theories. In my mind I am just thinking of it as ‘entanglement can only happen if both players use decisions that allow for superrationality’. I am worried your framing would imply that two CDT players are entangled, when I think they are not, they just happen to both always defect.
Also, if decision-entanglement is an objective feature of the world, then I would think it shouldn’t depend on what decision theory I personally hold. I could be CDTer who happens to have a perfect copy and so be decision-entangeled, while still refusing to believe in superrationality.
Sorry I don’t have any helpful high-level comments, I think I don’t understand the general thrust of the research agenda well enough to know what next directions are useful.
Thanks a lot for these comments, Oscar! :)
I forgot to copy-paste a footnote clarifying that “as made explicit in the Appendix, what “significant” exactly means depends on the payoffs of the game”! Fixed. I agree this is vague, still, although I guess it has to be since the payoffs are unspecified?
Also a copy-pasting mistake. Thanks for catching it! :)
This may be an unimportant detail, but—interestingly—I opted for this concept of “compatible DT” precisely because I wanted to imply that two CDT players may be decision-entangled! Say CDT-agent David plays a PD against a perfect copy of himself. Their decisions to defect are entangled, right? Whatever David does, his copy does the same (although David sort of “ignores” that when he makes his decision). David is very unlikely to be decision-entangled with any random CDT agent, however (in that case, the mutual defection is just a “coincidence” and is not due to some dependence between their respective reasoning/choices). I didn’t mean the concept of “decision-entanglement” to pre-assume superrationality. I want CDT-David to agree/admit that he is decision-entangled with his perfect copy. Nonetheless, since he doesn’t buy superrationality, I know that he won’t factor the decision-entanglement into his expected value optimization (he won’t “factor in the possibility that p=q”.) That’s why you need significant credence in both decision-entanglement and superrationality to get cooperation, here. :)
Agreed, but if you’re CDTer, you can’t be decision-entangled with an EDTer, right? Say you’re both told you’re decision-entangled. What happens? Well, you don’t care so you still defect while EDTer cooperates. Different decisions. So… you two weren’t entangled after all. The person who told you you were was mistaken.
So yes, decision-entanglement can’t depend on your DT per se, but doesn’t it have to depend on its “compatibility” with the other’s for there to be any dependence between your algos/choices? How could a CDTer and an EDTer be decision-entangled in a PD?
Not very confident about my answers. Feel free to object. :) And thanks for making me rethink my assumptions/definitions!