My claim is that the underlying process is irrelevant.
OK, then I disagree with your claim. If A’s decision-making process is very different from B’s, then A would be wrong to say “If I choose to cooperate, then B will also choose to cooperate.” There’s no reason that A should believe that; it’s just not true. Why would it be? But that logic is critical to your argument.
And if it’s not true, then A gets more utility by defecting.
Either one output is better than another, in which case all rational agents will output that decision regardless of the process they used to arrive at it, or two outputs are tied for best in which case all rational agents would calculate them as being tied and output indifference.
Being in a prisoner’s dilemma with someone whose decision-making process is known by me to be very similar to my own, is a different situation from being in prisoner’s dilemma with someone whose decision-making process is unknown by me in any detail but probably extremely different from mine. You can’t just say “either C is better than D or C is worse than D” in the absence of that auxiliary information, right? It changes the situation. In one case, C is better, and in the other case, D is better.
The situation is symmetric. If C is better for one player, it’s better for the other. If D is better for one player, it’s better for the other. And we know from construction that C-C is better for both than D-D, so that’s what a rational agent will pick.
All that matters is the output, not the process that generates it. If one agent is always rational, and the other agent is rational on Tuesdays and irrational on all other days, it’s still better to cooperate on Tuesdays.
The rational choice for player A depends on whether (RCC × P(B cooperates | A cooperates) + RCD × P(B defects | A cooperates)) is larger or smaller than (RDC × P(B cooperates | A defects) + RDD × P(B defects | A defects)). (“R” for “Reward”) Right?
So then one extreme end of the spectrum is that A and B are two instantiations of the exact same decision-making algorithm, a.k.a. perfect identical twins, and therefore P(B cooperates | A defects) = P(B defects | A cooperates) = 0.
The opposite extreme end of the spectrum is that A and B are running wildly different decision-making algorithms with nothing in common at all, and therefore P(B cooperates | A defects) = P(B cooperates | A cooperates) = P(B cooperates) and ditto for P(B defects).
In the former situation, it is rational for A to cooperate, and also rational for B to cooperate. In the latter situation, it is rational for A to defect, and also rational for B to defect. Do you agree? For example, in the latter case, if you compare A to an agent A’ with minimally-modified source code such that A’ cooperates instead, then B still defects, and thus A’ does worse than A. So you can’t say that A’ is being “rational” and A is not—A is doing better than A’ here.
(The latter counterfactual is not an A’ and B’ who both cooperate. Again, A and B are wildly different decision-making algorithms, spacelike separated. When I modify A into A’, there is no reason to think that someone-very-much-like-me is simultaneously modifying B into B’. B is still B.)
In between the former and the latter situations, there are situations where the algorithms A and B are not byte-for-byte identical but do have something in common, such that the output of algorithm A provides more than zero but less than definitive evidence about the output of algorithm B. Then it might or might not be rational to cooperate, depending on the strength of this evidence and the exact payoffs.
Hmm, maybe it will help if I make it very concrete. You, Isaac, will try to program a rational agent—call it AI,—and I, Steve, will try to program my own rational agent—call it AS. As it happens, I’m going to copy your entire source code because I’m lazy, but then I’ll add in a special-case that says: my AS will defect when in a prisoner’s dilemma with your AI.
Now let’s consider different cases:
You screwed up; your agent AI is not in fact rational. I assume you don’t put much credence here—you think that you know what rational agents are.
Your agent AI is a rational agent, and so is my agent AS. OK, now suppose there’s a prisoner’s dilemma with your agent AI and my agent AS. Then your AI will cooperate because that’s presumably how you programmed it: as you say in the post title, “rational agents cooperate in the prisoner’s dilemma”. And my AS is going to defect because, recall, I put that as a special-case in the source code. So my agent is doing strictly better than yours. Specifically: My agent and your agent take the same actions in all possible circumstances except for this particular AI-and-AS prisoner’s dilemma, where my agent gets the biggest prize and yours is a sucker. So then I might ask: Are you sure your agent AI is rational? Shouldn’t rationality be about systematized winning?
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. If that’s your belief, then my question for you is: On what grounds? Please point to a specific situation where my agent AS is taking an “irrational” action.
Hmm, yeah, there’s definitely been a miscommunication somewhere. I agree with everything you said up until the cases at the end (except potentially your formula at the beginning; I wasn’t sure what “RCC” denotes).
You screwed up; your agent AI is not in fact rational. If this is intended to be a realistic hypothetical, this is where almost all of my credence would be. Nobody knows how to formally define rational behavior in a useful way (i.e. not AIXI), many smart people have been working on it for years, and I certainly don’t think I’d be more likely to succeed myself. (I don’t understand the relevance of this bullet point though, since clearly the point of your thought experiment is to discuss actually-rational agents.)
Your agent AI is a rational agent, and so is my agent AS. N/A, your agent isn’t rational.
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. Yes, this is my belief. Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it, which results in a lower payoff than if it cooperated.
Sorry, RCC is the Reward / payoff to A if A Cooperates and if B also Cooperates, etc.
Nobody knows how to formally define rational behavior in a useful way
OK sure, let’s also imagine that you have access to a Jupiter-brain superintelligent oracle and can ask it for advice.
Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it
How does that “causes” work?
My agent has a module in the source code, that I specifically added when I was writing the code, that says “if I, AS, am in a prisoner’s dilemma with AI specifically, then output ‘defect’”.
Your agent has no such module.
How did my inserting this module change the behavior of your AI? Is your AI reading the source code of my AS or something? (If the agents are reading each other’s source code, that’s a very important ingredient to the scenario, and needs to be emphasized!!)
More generally, I understand that your AI follows the rule “always cooperate if you’re in a prisoner’s dilemma with another rational agent”. Right? But the rule is not just “always cooperate”, right? For example, if a rational agent is in a prisoner’s dilemma against cooperate-bot ( = the simple, not rational, agent that always cooperates no matter what), and if the rational agent knows for sure that the other party to the prisoner’s dilemma is definitely cooperate-bot, then the rational agent is obviously going to defect, right?
And therefore, AI needs to figure out whether the other party to its prisoner’s dilemma is or is not “a rational agent”. How does it do that? Shouldn’t it be uncertain, in practice, in many practical situations?
And if two rational agents are each uncertain about whether the other party to the prisoner’s dilemma is “a rational agent”, versus another kind of agent (e.g. cooperate-bot), isn’t it possible for them both to defect?
It’s specified in the premise of the problem that both players have access to the other player’s description; their source code, neural map, decision theory, whatever. My agent considers the behavior of your agent, sees that your agent is going to defect against mine no matter what mine does, and defects as well. (It would also defect if your additional module said “always cooperate with IsaacBot”, or “if playing against IsaacBot, flip a coin”, or anything else that breaks the correlation.)
“Always cooperate with other rational agents” is not the definition of being rational, it’s a consequence of being rational. If a rational agent is playing against an irrational agent, it will do whatever maximizes its utility; cooperate if the irrational agent’s behavior is nonetheless correlated with the rational agent, and otherwise defect.
OK cool. If the title had been “LDT agents cooperate with other LDT agents in the prisoner’s dilemma if they can see, trust, and fully understand each other’s source code; and therefore it’s irrational to be anything but an LDT agent if that kind of situation might arise” … then I wouldn’t have objected. That’s a bit verbose though I admit :) (If I had seen that title, my reaction would have been “That might or might not be true; seems plausible but maybe needs caveats, whatever, it’s beyond my expertise”, whereas with the current title my immediate reaction was “That’s wrong!”.)
I think I was put off because:
The part where the agents see each other’s source code (and trust it, and can reason omnisciently about it) is omitted from the title and very easy to miss IMO even when reading the text [this is sometimes called “open-source prisoner’s dilemma”—it has a special name because it’s not the thing that people are usually talking about when they talk about “prisoner’s dilemmas”];
Relatedly, I think “your opponent is constitutionally similar to you and therefore your decisions are correlated” and “your opponent can directly see and understand your source code and vice-versa” are two different reasons that an agent might cooperate in the prisoner’s dilemma, and your post and comments seem to exclusively talk about the former but now it turns out that we’re actually relying on the latter;
I think everyone agrees that the rational move is to defect against a CDT agent, and your title “rational agents cooperate in the prisoner’s dilemma” omits who the opponent is;
Even if you try to fix that by adding “…with each other” to the title, I think that doesn’t really help because there’s a kind of circularity, where the way you define “rational agents” (which I think is controversial, or at least part of the thing you’re arguing for) determines who the prisoner’s dilemma opponent is, which in turn determines what the rational move is, yet your current title seems to be making an argument about what it implies to be a rational agent, so you wind up effectively presupposing the answer in a confusing way.
OK, then I disagree with your claim. If A’s decision-making process is very different from B’s, then A would be wrong to say “If I choose to cooperate, then B will also choose to cooperate.” There’s no reason that A should believe that; it’s just not true. Why would it be? But that logic is critical to your argument.
And if it’s not true, then A gets more utility by defecting.
Being in a prisoner’s dilemma with someone whose decision-making process is known by me to be very similar to my own, is a different situation from being in prisoner’s dilemma with someone whose decision-making process is unknown by me in any detail but probably extremely different from mine. You can’t just say “either C is better than D or C is worse than D” in the absence of that auxiliary information, right? It changes the situation. In one case, C is better, and in the other case, D is better.
The situation is symmetric. If C is better for one player, it’s better for the other. If D is better for one player, it’s better for the other. And we know from construction that C-C is better for both than D-D, so that’s what a rational agent will pick.
All that matters is the output, not the process that generates it. If one agent is always rational, and the other agent is rational on Tuesdays and irrational on all other days, it’s still better to cooperate on Tuesdays.
The rational choice for player A depends on whether (RCC × P(B cooperates | A cooperates) + RCD × P(B defects | A cooperates)) is larger or smaller than (RDC × P(B cooperates | A defects) + RDD × P(B defects | A defects)). (“R” for “Reward”) Right?
So then one extreme end of the spectrum is that A and B are two instantiations of the exact same decision-making algorithm, a.k.a. perfect identical twins, and therefore P(B cooperates | A defects) = P(B defects | A cooperates) = 0.
The opposite extreme end of the spectrum is that A and B are running wildly different decision-making algorithms with nothing in common at all, and therefore P(B cooperates | A defects) = P(B cooperates | A cooperates) = P(B cooperates) and ditto for P(B defects).
In the former situation, it is rational for A to cooperate, and also rational for B to cooperate. In the latter situation, it is rational for A to defect, and also rational for B to defect. Do you agree? For example, in the latter case, if you compare A to an agent A’ with minimally-modified source code such that A’ cooperates instead, then B still defects, and thus A’ does worse than A. So you can’t say that A’ is being “rational” and A is not—A is doing better than A’ here.
(The latter counterfactual is not an A’ and B’ who both cooperate. Again, A and B are wildly different decision-making algorithms, spacelike separated. When I modify A into A’, there is no reason to think that someone-very-much-like-me is simultaneously modifying B into B’. B is still B.)
In between the former and the latter situations, there are situations where the algorithms A and B are not byte-for-byte identical but do have something in common, such that the output of algorithm A provides more than zero but less than definitive evidence about the output of algorithm B. Then it might or might not be rational to cooperate, depending on the strength of this evidence and the exact payoffs.
Hmm, maybe it will help if I make it very concrete. You, Isaac, will try to program a rational agent—call it AI,—and I, Steve, will try to program my own rational agent—call it AS. As it happens, I’m going to copy your entire source code because I’m lazy, but then I’ll add in a special-case that says: my AS will defect when in a prisoner’s dilemma with your AI.
Now let’s consider different cases:
You screwed up; your agent AI is not in fact rational. I assume you don’t put much credence here—you think that you know what rational agents are.
Your agent AI is a rational agent, and so is my agent AS. OK, now suppose there’s a prisoner’s dilemma with your agent AI and my agent AS. Then your AI will cooperate because that’s presumably how you programmed it: as you say in the post title, “rational agents cooperate in the prisoner’s dilemma”. And my AS is going to defect because, recall, I put that as a special-case in the source code. So my agent is doing strictly better than yours. Specifically: My agent and your agent take the same actions in all possible circumstances except for this particular AI-and-AS prisoner’s dilemma, where my agent gets the biggest prize and yours is a sucker. So then I might ask: Are you sure your agent AI is rational? Shouldn’t rationality be about systematized winning?
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. If that’s your belief, then my question for you is: On what grounds? Please point to a specific situation where my agent AS is taking an “irrational” action.
Hmm, yeah, there’s definitely been a miscommunication somewhere. I agree with everything you said up until the cases at the end (except potentially your formula at the beginning; I wasn’t sure what “RCC” denotes).
You screwed up; your agent AI is not in fact rational. If this is intended to be a realistic hypothetical, this is where almost all of my credence would be. Nobody knows how to formally define rational behavior in a useful way (i.e. not AIXI), many smart people have been working on it for years, and I certainly don’t think I’d be more likely to succeed myself. (I don’t understand the relevance of this bullet point though, since clearly the point of your thought experiment is to discuss actually-rational agents.)
Your agent AI is a rational agent, and so is my agent AS. N/A, your agent isn’t rational.
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. Yes, this is my belief. Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it, which results in a lower payoff than if it cooperated.
Sorry, RCC is the Reward / payoff to A if A Cooperates and if B also Cooperates, etc.
OK sure, let’s also imagine that you have access to a Jupiter-brain superintelligent oracle and can ask it for advice.
How does that “causes” work?
My agent has a module in the source code, that I specifically added when I was writing the code, that says “if I, AS, am in a prisoner’s dilemma with AI specifically, then output ‘defect’”.
Your agent has no such module.
How did my inserting this module change the behavior of your AI? Is your AI reading the source code of my AS or something? (If the agents are reading each other’s source code, that’s a very important ingredient to the scenario, and needs to be emphasized!!)
More generally, I understand that your AI follows the rule “always cooperate if you’re in a prisoner’s dilemma with another rational agent”. Right? But the rule is not just “always cooperate”, right? For example, if a rational agent is in a prisoner’s dilemma against cooperate-bot ( = the simple, not rational, agent that always cooperates no matter what), and if the rational agent knows for sure that the other party to the prisoner’s dilemma is definitely cooperate-bot, then the rational agent is obviously going to defect, right?
And therefore, AI needs to figure out whether the other party to its prisoner’s dilemma is or is not “a rational agent”. How does it do that? Shouldn’t it be uncertain, in practice, in many practical situations?
And if two rational agents are each uncertain about whether the other party to the prisoner’s dilemma is “a rational agent”, versus another kind of agent (e.g. cooperate-bot), isn’t it possible for them both to defect?
It’s specified in the premise of the problem that both players have access to the other player’s description; their source code, neural map, decision theory, whatever. My agent considers the behavior of your agent, sees that your agent is going to defect against mine no matter what mine does, and defects as well. (It would also defect if your additional module said “always cooperate with IsaacBot”, or “if playing against IsaacBot, flip a coin”, or anything else that breaks the correlation.)
“Always cooperate with other rational agents” is not the definition of being rational, it’s a consequence of being rational. If a rational agent is playing against an irrational agent, it will do whatever maximizes its utility; cooperate if the irrational agent’s behavior is nonetheless correlated with the rational agent, and otherwise defect.
OK cool. If the title had been “LDT agents cooperate with other LDT agents in the prisoner’s dilemma if they can see, trust, and fully understand each other’s source code; and therefore it’s irrational to be anything but an LDT agent if that kind of situation might arise” … then I wouldn’t have objected. That’s a bit verbose though I admit :) (If I had seen that title, my reaction would have been “That might or might not be true; seems plausible but maybe needs caveats, whatever, it’s beyond my expertise”, whereas with the current title my immediate reaction was “That’s wrong!”.)
I think I was put off because:
The part where the agents see each other’s source code (and trust it, and can reason omnisciently about it) is omitted from the title and very easy to miss IMO even when reading the text [this is sometimes called “open-source prisoner’s dilemma”—it has a special name because it’s not the thing that people are usually talking about when they talk about “prisoner’s dilemmas”];
Relatedly, I think “your opponent is constitutionally similar to you and therefore your decisions are correlated” and “your opponent can directly see and understand your source code and vice-versa” are two different reasons that an agent might cooperate in the prisoner’s dilemma, and your post and comments seem to exclusively talk about the former but now it turns out that we’re actually relying on the latter;
I think everyone agrees that the rational move is to defect against a CDT agent, and your title “rational agents cooperate in the prisoner’s dilemma” omits who the opponent is;
Even if you try to fix that by adding “…with each other” to the title, I think that doesn’t really help because there’s a kind of circularity, where the way you define “rational agents” (which I think is controversial, or at least part of the thing you’re arguing for) determines who the prisoner’s dilemma opponent is, which in turn determines what the rational move is, yet your current title seems to be making an argument about what it implies to be a rational agent, so you wind up effectively presupposing the answer in a confusing way.
See Nate Soares’s Decision theory does not imply that we get to have nice things for a sense of how the details really matter and can easily go awry in regards to “reading & understanding each other’s source code”.