Conditions for Superrationality-motivated Cooperation in a one-shot Prisoner’s Dilemma

Summary

It has been argued that, if two very similar agents follow decision theories allowing for superrationality (e.g., EDT and FDT), they would cooperate in a prisoner’s dilemma (PD) (see e.g., Oesterheld 2017). But how similar do they need to be exactly? In what way? This post is an attempt at addressing these questions. This is, I believe, particularly relevant to the work of the Center on Long-Term Risk on acausal reasoning and the foundations of rational agency (see section 7 of their research agenda).

I’d be very interested in critics/​comments/​feedback. This is the main reason why I’m posting this here. :)

Normal PD

Consider this traditional PD between two agents:

Alice/​BobCD
C3, 30, 5
D5, 01, 1


We can compute the expected payoffs of Alice and Bob (and ) as a function of (the probability that Alice plays C) and (the probability that Bob plays C):

Now, Alice wants to find (the optimal , i.e., the p that will maximize her payoff). Symmetrically, Bob wants to find . They do some quick math and find that =0, i.e., they should both play D. This is the unique Nash equilibrium of this game.

Perfect-copy PD

Now, say Alice and Bob are perfect copies. How does it change the game presented above? We still have the same payoffs:

However, this time, . Whatever one does, that’s evidence that the other does the exact same. They are decision-entangled[1].

What does that mean for the payoff functions of Alice and Bob? Well, decision theorists disagree. Let’s see what the two most popular decision theories (CDT and EDT) say, according to my (naive?) understanding:

  • EDT: “Alice should substitute q for p and her formula. Symmetrically, Bob should do the exact opposite in his”.

  • CDT: “Alice should hold q fixed. Same for Bob and p. They should behave as if they could change their action unilaterally through some kind of magic.” Therefore, CDT computes the dominant strategy from the original payoffs, ignoring the fact that p=q.

For CDT, just like in the normal PD above. For EDT, however, we now get (Alice and Bob should both cooperate). EDT is one of the decision theories that allow for superrationality: cooperation via entangled decision-making (Hofstadter 1983), or basically “factoring in the possibility that ”, as I understand it. So the difference between the Normal PD and the Perfect-copy PD matters only if both players have at least some credence in superrationality.

Formalizing the Conditions for Superrationality-motivated Cooperation in a one-shot PD

Given the above, we can hypothesize that Alice will (superrationally) cooperate with Bob in a one-shot PD iff:

  1. She has a significant[2] credence in the possibility that they’re playing a Perfect-copy PD – as opposed to a Normal PD – (i.e., that they are decision-entangled), and

  2. She has a significant credence in superrationality, such that she takes into account this decision-entanglement when she does the math. (This is assuming Alice might have decision-theoretic uncertainty.)

We then get those two neat conditions for cooperation:

  1. Significant credence in decision-entanglement

  2. Significant credence in superrationality

But what makes two agents decision-entangled?

Conditions for decision-entanglement

How does/​should Alice form her credence in decision-entanglement? What are the required elements for two agents to have entangled decisions in a particular game?

First of all, you obviously need them to have compatible decision theories (DTs)[3]. Here’s (I think) a somehow representative instance of what happens if you don’t:

Now, replace Hilary with some EDT players, such that the compatible DTs condition is met. Does that mean the players have entangled decisions? No! Here’s an example proving that this doesn’t suffice:

Although they both follow EDT, their beliefs regarding decision-entanglement diverge. In addition to “I believe we have compatible DTs”, Arif thinks there are other requirements that are not met, here.

To identify what those requirements are, it is important to clarify what outputs the players’ beliefs: their epistemic algorithms[4] (which themselves take some pieces of evidence as inputs).


It then becomes clear what the requirements are besides “I believe we have compatible DTs for Arif to believe there is decision-entanglement:

  • “I believe we have entangled epistemic algorithms (or that there is epistemic-entanglement[5], for short)”, and

  • “I believe we have been exposed to compatible pieces of evidence.

Since rational Arif doesn’t believe he’s decision-entangled with John, that means he must think that at least one of the two latter statements is false.[6]


Now, what is the evidence John and Arif should be looking for?

First, they want to compare their DTs to see if they’re compatible, as well as their epistemics to see if they’re entangled.

Then, if they have compatible DTs and entangled epistemics, they also need common knowledge of that fact, which means that they need to somehow check whether they have been exposed to compatible evidence regarding those two things, and to check that they have been exposed to compatible evidence regarding their exposure to evidence, and so on ad infinitum.[7] If they don’t verify all of this, they would end up with non-entangled beliefs and non-entangled decisions.

So here is how, I tentatively think, one-shot-PD players should reason:

Recall our conditions for superrationality-motivated cooperation in a one-shot PD:

  1. Significant credence in decision-entanglement

  2. Significant credence in superrationality

Assuming God doesn’t tell Alice whether her counterpart is decision-entangled with her, Alice would have a significant credence regarding #1 iff:

  • Significant credence in compatible DTs

  • Significant credence in epistemic-entanglement

  • Significant credence in the possibility that they have been exposed to some compatible pieces of evidence

Therefore, (again, assuming God doesn’t tell her whether her counterpart is decision-entangled with her) Alice would cooperate iff she has:

1. Significant credence in decision-entanglement

1.1 Significant credence in compatible DTs

1.2 Significant credence in epistemic-entanglement

1.3 Significant credence in the possibility that they have been exposed to some compatible pieces of evidence

2. Significant credence in superrationality

Remaining open questions

  • In our Normal PD and Perfect-copy PD games, we took two extreme examples where the credences were maximally low and maximally high, respectively. But what if Alice has uncertain beliefs when it comes to these conditions? What should she do?

    • For what it’s worth, the Appendix addresses the case where Alice is uncertain about #1 (without specifying credences about 1.1, 1.2, 1.3, though).

  • Alice now knows (thanks to me; you’re welcome Alice) that, in order to estimate the probability that she’s decision-entangled with Bob, she should factor the probability of i) Bob also being superrational, ii) Bob and she being epistemic-entangled, and iii) Bob and she having been exposed to compatible pieces of evidence. Coming up with a credence regarding i) doesn’t seem insuperable. The distinction between DTs that allow for superrationality and those that don’t is pretty clear. Coming up with a credence regarding ii) and iii), however, seems much more challenging. How would she do that? Where should she even look at? What about the infinite recursion when looking for relevant pieces of evidence?

Acknowledgments

Thanks to Sylvester Kollin and Nicolas Macé for fruitful discussions, as well as for benevolently teaching me some of the maths/​game theory I used (mainly in the Appendix).

Thanks to Caspar Oesterheld, Johannes Treutlein, Lukas Gloor, Matīss Apinis, and Antonin Broi for very helpful feedback, suggestions, and discussions. Credits to Johannes Treutlein and Oscar Delaney for spotting a few crucial math and/​or notation errors in earlier drafts.

Most of the work put into this post has been funded by CERI (now ERA) through their summer research fellowship. I’ve also benefited quite a lot from being welcome to work from the office of the Center on Long-Term Risk. I’m grateful to those two organizations, to their respective teams, as well as to all their summer research fellows with whom I had a very nice and productive time.

All assumptions/​claims are my own. No organization or individual other than me is responsible for my potential inaccuracies, mistakes, or omissions.

Appendix: What if Alice is uncertain whether she and Bob are decision-entangled?

  1. ^

    A few clarifications on this notion of decision-entanglement and my use of it:

    - I am, here, assuming that the presence of decision-entanglement is an objective fact about the world, i.e., that there is something that does (or doesn’t) make the decisions of two agents entangled, and that it is not up to our interpretation (this doesn’t mean that decision-entanglement doesn’t heavily rely on the subjectivity of the two agents). This assumption is non-obvious and controversial. However, I am using this “entanglement realist” framework all along the post, and think the takeaways would be the same if I was adopting an “anti-realist” view. This is the reason why I don’t wanna bother thinking too much about this “entanglement (anti-)realism” thing. It doesn’t seem useful. Nonetheless, please let me know if you think my framework leads me to conclusions that are peculiar to it, such that they would be more questionable.

    - Note that, although we took an example with perfect copies here, two agents do not need to have entangled decisions in absolutely every possible situation, in order to be (relevantly) decision-entangled. We only care about the decision they make in the PD presented here, so they could as well be imperfect copies and make unentangled decisions in other situations.

    - Unless specified otherwise, I assume decision-entanglement with regard to one decision to be something binary (on a given problem, the decisions of two agents are entangled or they aren’t; no in between), for the sake of simplicity.

  2. ^

    As demonstrated in the Appendix, what “significant” exactly means depends on the payoffs of the game. This applies to every time I use that term in this post.

  3. ^

    By “compatible”, I mostly mean something like “similar”, although it’s sort of arbitrary what counts as “similar” or not (e.g., Alice and Bob could have two DTs that seem widely different from our perspective, although they’re compatible in the sense that they both allow or don’t allow for superrationality).

  4. ^

    Thanks to Sylvester Kollin for suggesting to me to clearly differentiate between decision and epistemic algorithms in such games.

  5. ^

    John and Arif are epistemically entangled iff 1) in the particular situation they’re in, their epistemic algos output similar results, given similar inputs, and 2) in the particular situation they’re in, they can’t unilaterally modify their epistemic algos.

  6. ^

    Here’s an example of what happens when the only condition not met is the one regarding epistemic-entanglement. Here is one where only the one regarding compatible evidence is not met.

  7. ^

    Thanks to Caspar Oesterheld for informing me that the infinite recursion I was gesturing at was known as “common knowledge” in game theory.