Assuming the existence of an interpreter that can run the opponent’s code (which you seem to be assuming as well), there’s not a lot of logic here to formalize.
The dragons only arise when you have two non-trivial processes operating against each other. The issue with your formalized logic arises when it recurses; when the two agents are playing a “I know what you’re thinking.” “I know that you know what I’m thinking.” “I know that you know that I know what you’re thinking.” game. As long as one of the two agents uses a trivialized process, you can avoid this situation.
By trivialized process, I mean non-self referential.
In formalized pseudocode, where f(x, y) returns “Cooperate” or “Defect”, x is the code to be tested (your opponent’s code), and y is the code to test with:
It’s relatively simple to see that it cooperates against a CooperateBot, which it uses as its signal; agents which cooperate with CooperateBot are considered “Trustworthy.” Because it cooperates with CooperateBot, it will cooperate with itself.
Now, you may say that it is throwing away free points by cooperating with CooperateBot. This is true in some sense, but in another, it is avoiding extensive computation by using this methodology, and making its behavior predictable; these lost points are the cost paid for these advantages. If computation costs utility, the relative cheapness of this behavior is important.
The penalty it pays for this behavior is that it cannot simultaneously guarantee that its opponent will defect against a DefectBot; that would introduce an inconsistency. This -appears- to be the issue with your original agent. To potentially resolve that issue, a somewhat better mechanism might be comparing behavior against the next iteration of moves made by a Tit For Tat agent against a cooperative and a defecting bot, which would reward vindictive but trustworthy agents. (A necessary but not sufficient element to avoid inconsistency is that the agent being compared against is not reflective at all—that is, doesn’t examine its opponent’s source code.) However, I am having trouble proving out that that behavior wouldn’t also be inconsistent.
ETA: Why do I claim that self-referential (non-trivial) agents aren’t consistent, when opposing each other? For the following reason: Agent A runs f(x,this) on Agent B; agent B runs f(x,this) on Agent A. This becomes f(f(f(f(f(f(f(..., this) - that is, it’s infinite recursion. Masquerade solves this by trying out a variety of different “masks”—which makes the reflective agent a subset of Masquerade, using only a single mask. The issue is that Masquerade is designed to deal with other reflective agents; the two agents would mutually defect if Masquerade defects against CooperateBot in the original formulation, which is to say, the addition of masks can make Masquerade -less effective- depending upon the composition of its opponent. Whatever signal different reflective agents use, Masquerade is likely to break, given a sufficiently large number of masks.
Now, you may say that it is throwing away free points by cooperating with CooperateBot.
Indeed, I do say this. I’m looking to formalize an agent that does the obviously correct things against CooperateBot, DefectBot, FairBot, and itself, without being exploitable by any opponent (i.e. the opponent cannot do better than a certain Nash equilibrium unless the agent does better too). Anything weaker than that simply doesn’t interest me.
One reason to care about performance against CooperateBot is that playing correctly against constant strategies is equivalent to winning at one-player games. Rational agents should win, especially if there are no other agents in the picture!
“Always defect” meets your criteria. You missed a criteria: That it wouldn’t miss out on a cooperation it could achieve if its strategy were different.
Your agent will have to consider, not only its current opponent, but every opponent currently in the game. (If Masquerade played in a larger game full of my reflective agents, it would consistently -lose-, because it would choose a mask it thinks they will cooperate against, and they will find the mask it would use against CooperateBot and believe it would defect against them.)
Therefore, in a game full of my reflective agents and one CooperateBot, your agent should choose to cooperate with CooperateBot, because otherwise my reflective agents will always defect against it.
I said that it needed to cooperate with FairBot and with itself (and not for gerrymandered reasons the way that CliqueBots do).
You missed a criteria: That it wouldn’t miss out on a cooperation it could achieve if its strategy were different.
This is impossible to guarantee without thereby being exploitable. Say that there are agents which cooperate iff their opponent cooperates with DefectBot.
Part of the trick is figuring out exactly what kind of optimality we’re looking for, and I don’t have a good definition, but I tend to think of agents like the ones you defined as “not really trying to win according to their stipulated payoff matrix”, and so I’m less worried about optimizing my strategy for them. But those sort of considerations do factor into UDT thinking, which adds entire other layers of confusion.
Assuming the existence of an interpreter that can run the opponent’s code (which you seem to be assuming as well), there’s not a lot of logic here to formalize.
The dragons only arise when you have two non-trivial processes operating against each other. The issue with your formalized logic arises when it recurses; when the two agents are playing a “I know what you’re thinking.” “I know that you know what I’m thinking.” “I know that you know that I know what you’re thinking.” game. As long as one of the two agents uses a trivialized process, you can avoid this situation.
By trivialized process, I mean non-self referential.
In formalized pseudocode, where f(x, y) returns “Cooperate” or “Defect”, x is the code to be tested (your opponent’s code), and y is the code to test with:
y: Cooperate();
main: if(f(x, y)) = Cooperate { Cooperate(); } else { Defect(); }
That would be the whole of the simple reflective agent; the vindictive agent would look thus:
boolean hasFailed=false;
y: Cooperate();
main: if(f(x, y)) = Cooperate and !hasFailed { Cooperate(); } else { Defect(); }
onHasCooperatedWithDefector: hasFailed = true;
It’s relatively simple to see that it cooperates against a CooperateBot, which it uses as its signal; agents which cooperate with CooperateBot are considered “Trustworthy.” Because it cooperates with CooperateBot, it will cooperate with itself.
Now, you may say that it is throwing away free points by cooperating with CooperateBot. This is true in some sense, but in another, it is avoiding extensive computation by using this methodology, and making its behavior predictable; these lost points are the cost paid for these advantages. If computation costs utility, the relative cheapness of this behavior is important.
The penalty it pays for this behavior is that it cannot simultaneously guarantee that its opponent will defect against a DefectBot; that would introduce an inconsistency. This -appears- to be the issue with your original agent. To potentially resolve that issue, a somewhat better mechanism might be comparing behavior against the next iteration of moves made by a Tit For Tat agent against a cooperative and a defecting bot, which would reward vindictive but trustworthy agents. (A necessary but not sufficient element to avoid inconsistency is that the agent being compared against is not reflective at all—that is, doesn’t examine its opponent’s source code.) However, I am having trouble proving out that that behavior wouldn’t also be inconsistent.
ETA: Why do I claim that self-referential (non-trivial) agents aren’t consistent, when opposing each other? For the following reason: Agent A runs f(x,this) on Agent B; agent B runs f(x,this) on Agent A. This becomes f(f(f(f(f(f(f(..., this) - that is, it’s infinite recursion. Masquerade solves this by trying out a variety of different “masks”—which makes the reflective agent a subset of Masquerade, using only a single mask. The issue is that Masquerade is designed to deal with other reflective agents; the two agents would mutually defect if Masquerade defects against CooperateBot in the original formulation, which is to say, the addition of masks can make Masquerade -less effective- depending upon the composition of its opponent. Whatever signal different reflective agents use, Masquerade is likely to break, given a sufficiently large number of masks.
Indeed, I do say this. I’m looking to formalize an agent that does the obviously correct things against CooperateBot, DefectBot, FairBot, and itself, without being exploitable by any opponent (i.e. the opponent cannot do better than a certain Nash equilibrium unless the agent does better too). Anything weaker than that simply doesn’t interest me.
One reason to care about performance against CooperateBot is that playing correctly against constant strategies is equivalent to winning at one-player games. Rational agents should win, especially if there are no other agents in the picture!
“Always defect” meets your criteria. You missed a criteria: That it wouldn’t miss out on a cooperation it could achieve if its strategy were different.
Your agent will have to consider, not only its current opponent, but every opponent currently in the game. (If Masquerade played in a larger game full of my reflective agents, it would consistently -lose-, because it would choose a mask it thinks they will cooperate against, and they will find the mask it would use against CooperateBot and believe it would defect against them.)
Therefore, in a game full of my reflective agents and one CooperateBot, your agent should choose to cooperate with CooperateBot, because otherwise my reflective agents will always defect against it.
I said that it needed to cooperate with FairBot and with itself (and not for gerrymandered reasons the way that CliqueBots do).
This is impossible to guarantee without thereby being exploitable. Say that there are agents which cooperate iff their opponent cooperates with DefectBot.
Part of the trick is figuring out exactly what kind of optimality we’re looking for, and I don’t have a good definition, but I tend to think of agents like the ones you defined as “not really trying to win according to their stipulated payoff matrix”, and so I’m less worried about optimizing my strategy for them. But those sort of considerations do factor into UDT thinking, which adds entire other layers of confusion.