Hmm, yeah, there’s definitely been a miscommunication somewhere. I agree with everything you said up until the cases at the end (except potentially your formula at the beginning; I wasn’t sure what “RCC” denotes).
You screwed up; your agent AI is not in fact rational. If this is intended to be a realistic hypothetical, this is where almost all of my credence would be. Nobody knows how to formally define rational behavior in a useful way (i.e. not AIXI), many smart people have been working on it for years, and I certainly don’t think I’d be more likely to succeed myself. (I don’t understand the relevance of this bullet point though, since clearly the point of your thought experiment is to discuss actually-rational agents.)
Your agent AI is a rational agent, and so is my agent AS. N/A, your agent isn’t rational.
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. Yes, this is my belief. Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it, which results in a lower payoff than if it cooperated.
Sorry, RCC is the Reward / payoff to A if A Cooperates and if B also Cooperates, etc.
Nobody knows how to formally define rational behavior in a useful way
OK sure, let’s also imagine that you have access to a Jupiter-brain superintelligent oracle and can ask it for advice.
Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it
How does that “causes” work?
My agent has a module in the source code, that I specifically added when I was writing the code, that says “if I, AS, am in a prisoner’s dilemma with AI specifically, then output ‘defect’”.
Your agent has no such module.
How did my inserting this module change the behavior of your AI? Is your AI reading the source code of my AS or something? (If the agents are reading each other’s source code, that’s a very important ingredient to the scenario, and needs to be emphasized!!)
More generally, I understand that your AI follows the rule “always cooperate if you’re in a prisoner’s dilemma with another rational agent”. Right? But the rule is not just “always cooperate”, right? For example, if a rational agent is in a prisoner’s dilemma against cooperate-bot ( = the simple, not rational, agent that always cooperates no matter what), and if the rational agent knows for sure that the other party to the prisoner’s dilemma is definitely cooperate-bot, then the rational agent is obviously going to defect, right?
And therefore, AI needs to figure out whether the other party to its prisoner’s dilemma is or is not “a rational agent”. How does it do that? Shouldn’t it be uncertain, in practice, in many practical situations?
And if two rational agents are each uncertain about whether the other party to the prisoner’s dilemma is “a rational agent”, versus another kind of agent (e.g. cooperate-bot), isn’t it possible for them both to defect?
It’s specified in the premise of the problem that both players have access to the other player’s description; their source code, neural map, decision theory, whatever. My agent considers the behavior of your agent, sees that your agent is going to defect against mine no matter what mine does, and defects as well. (It would also defect if your additional module said “always cooperate with IsaacBot”, or “if playing against IsaacBot, flip a coin”, or anything else that breaks the correlation.)
“Always cooperate with other rational agents” is not the definition of being rational, it’s a consequence of being rational. If a rational agent is playing against an irrational agent, it will do whatever maximizes its utility; cooperate if the irrational agent’s behavior is nonetheless correlated with the rational agent, and otherwise defect.
OK cool. If the title had been “LDT agents cooperate with other LDT agents in the prisoner’s dilemma if they can see, trust, and fully understand each other’s source code; and therefore it’s irrational to be anything but an LDT agent if that kind of situation might arise” … then I wouldn’t have objected. That’s a bit verbose though I admit :) (If I had seen that title, my reaction would have been “That might or might not be true; seems plausible but maybe needs caveats, whatever, it’s beyond my expertise”, whereas with the current title my immediate reaction was “That’s wrong!”.)
I think I was put off because:
The part where the agents see each other’s source code (and trust it, and can reason omnisciently about it) is omitted from the title and very easy to miss IMO even when reading the text [this is sometimes called “open-source prisoner’s dilemma”—it has a special name because it’s not the thing that people are usually talking about when they talk about “prisoner’s dilemmas”];
Relatedly, I think “your opponent is constitutionally similar to you and therefore your decisions are correlated” and “your opponent can directly see and understand your source code and vice-versa” are two different reasons that an agent might cooperate in the prisoner’s dilemma, and your post and comments seem to exclusively talk about the former but now it turns out that we’re actually relying on the latter;
I think everyone agrees that the rational move is to defect against a CDT agent, and your title “rational agents cooperate in the prisoner’s dilemma” omits who the opponent is;
Even if you try to fix that by adding “…with each other” to the title, I think that doesn’t really help because there’s a kind of circularity, where the way you define “rational agents” (which I think is controversial, or at least part of the thing you’re arguing for) determines who the prisoner’s dilemma opponent is, which in turn determines what the rational move is, yet your current title seems to be making an argument about what it implies to be a rational agent, so you wind up effectively presupposing the answer in a confusing way.
Hmm, yeah, there’s definitely been a miscommunication somewhere. I agree with everything you said up until the cases at the end (except potentially your formula at the beginning; I wasn’t sure what “RCC” denotes).
You screwed up; your agent AI is not in fact rational. If this is intended to be a realistic hypothetical, this is where almost all of my credence would be. Nobody knows how to formally define rational behavior in a useful way (i.e. not AIXI), many smart people have been working on it for years, and I certainly don’t think I’d be more likely to succeed myself. (I don’t understand the relevance of this bullet point though, since clearly the point of your thought experiment is to discuss actually-rational agents.)
Your agent AI is a rational agent, and so is my agent AS. N/A, your agent isn’t rational.
Your agent AI is a rational agent, but my agent AS is not in fact a rational agent. Yes, this is my belief. Your agent is irrational because it’s choosing to defect against mine, which causes mine to defect against it, which results in a lower payoff than if it cooperated.
Sorry, RCC is the Reward / payoff to A if A Cooperates and if B also Cooperates, etc.
OK sure, let’s also imagine that you have access to a Jupiter-brain superintelligent oracle and can ask it for advice.
How does that “causes” work?
My agent has a module in the source code, that I specifically added when I was writing the code, that says “if I, AS, am in a prisoner’s dilemma with AI specifically, then output ‘defect’”.
Your agent has no such module.
How did my inserting this module change the behavior of your AI? Is your AI reading the source code of my AS or something? (If the agents are reading each other’s source code, that’s a very important ingredient to the scenario, and needs to be emphasized!!)
More generally, I understand that your AI follows the rule “always cooperate if you’re in a prisoner’s dilemma with another rational agent”. Right? But the rule is not just “always cooperate”, right? For example, if a rational agent is in a prisoner’s dilemma against cooperate-bot ( = the simple, not rational, agent that always cooperates no matter what), and if the rational agent knows for sure that the other party to the prisoner’s dilemma is definitely cooperate-bot, then the rational agent is obviously going to defect, right?
And therefore, AI needs to figure out whether the other party to its prisoner’s dilemma is or is not “a rational agent”. How does it do that? Shouldn’t it be uncertain, in practice, in many practical situations?
And if two rational agents are each uncertain about whether the other party to the prisoner’s dilemma is “a rational agent”, versus another kind of agent (e.g. cooperate-bot), isn’t it possible for them both to defect?
It’s specified in the premise of the problem that both players have access to the other player’s description; their source code, neural map, decision theory, whatever. My agent considers the behavior of your agent, sees that your agent is going to defect against mine no matter what mine does, and defects as well. (It would also defect if your additional module said “always cooperate with IsaacBot”, or “if playing against IsaacBot, flip a coin”, or anything else that breaks the correlation.)
“Always cooperate with other rational agents” is not the definition of being rational, it’s a consequence of being rational. If a rational agent is playing against an irrational agent, it will do whatever maximizes its utility; cooperate if the irrational agent’s behavior is nonetheless correlated with the rational agent, and otherwise defect.
OK cool. If the title had been “LDT agents cooperate with other LDT agents in the prisoner’s dilemma if they can see, trust, and fully understand each other’s source code; and therefore it’s irrational to be anything but an LDT agent if that kind of situation might arise” … then I wouldn’t have objected. That’s a bit verbose though I admit :) (If I had seen that title, my reaction would have been “That might or might not be true; seems plausible but maybe needs caveats, whatever, it’s beyond my expertise”, whereas with the current title my immediate reaction was “That’s wrong!”.)
I think I was put off because:
The part where the agents see each other’s source code (and trust it, and can reason omnisciently about it) is omitted from the title and very easy to miss IMO even when reading the text [this is sometimes called “open-source prisoner’s dilemma”—it has a special name because it’s not the thing that people are usually talking about when they talk about “prisoner’s dilemmas”];
Relatedly, I think “your opponent is constitutionally similar to you and therefore your decisions are correlated” and “your opponent can directly see and understand your source code and vice-versa” are two different reasons that an agent might cooperate in the prisoner’s dilemma, and your post and comments seem to exclusively talk about the former but now it turns out that we’re actually relying on the latter;
I think everyone agrees that the rational move is to defect against a CDT agent, and your title “rational agents cooperate in the prisoner’s dilemma” omits who the opponent is;
Even if you try to fix that by adding “…with each other” to the title, I think that doesn’t really help because there’s a kind of circularity, where the way you define “rational agents” (which I think is controversial, or at least part of the thing you’re arguing for) determines who the prisoner’s dilemma opponent is, which in turn determines what the rational move is, yet your current title seems to be making an argument about what it implies to be a rational agent, so you wind up effectively presupposing the answer in a confusing way.
See Nate Soares’s Decision theory does not imply that we get to have nice things for a sense of how the details really matter and can easily go awry in regards to “reading & understanding each other’s source code”.