Remember how I said that my idea made qualitative sense until I tried to show rigorously that it would do the right thing?
That strategy isn’t identical to mine- it’s similar to “Not Quite Timeless Decision Theory”, except less well formalized. Consider the first two Problems in that post, and whether they apply to that strategy too.
It’s less formalized for a reason, though; the strategy isn’t perfection. It asserts a particular proposition and attempts to defend it. (Hence the use of the term “Rationalization” as opposed to prediction.) It also does not engage in recursive logic; it evaluates only what its opponent will do against a specific strategy, which is “Always cooperate.” If the opponent will cooperate against that strategy, it is assumed to be safe. If the opponent defects against an always-cooperator, it defects as well.
This rationalization engine doesn’t seek to do better than all other agents, it only seeks to do as well as every agent it comes up against. There is a specific counterstrategy which defeats it—cooperate with always-cooperators, defect against opponents who cooperate if you cooperate with always-cooperators. Introduce a “Falsified prediction” mechanism, by which it then always defects against opponents it falsely predicted would cooperate in the past, however, and it becomes an effective blackmail agent against every other agent in iterated games; such a counterstrategy becomes more expensive than simply cooperating.
It has the advantage of being extremely predictable; it will predict itself to cooperate against an always-cooperator, and because it doesn’t recurse, as long as P is remotely good at prediction, it will predict it to one-box in Newcomb’s Problem.
(Fundamentally the issues raised are whether an agent can be stupid enough to be predictable while being smart enough to accurately predict. The trick to this is that you can’t be afraid of getting the wrong answer; it is deliberately less formalized, and deliberately does not attempt to prove its solutions are ideal.)
ETA: The purpose of this agent is to exploit signaling; it observes what other agents would do to a helpless agent, suspecting that other agents may be nastier than it is, and that it is similarly helpless. It then behaves based on that information. It is in effect a reflective agent; it does to the other agent what the other agent would do to a helpless agent. It would calculate using the same probabilities other agents would, as well; against the 99%-cooperate timeless agent mentioned by Vaniver, it has a 99% chance of cooperating itself. (Although if it predicts wrong it then begins always defecting; such strategies would be risky against the reflective vindictive agent.)
Since the impasse only comes up when we try and rigorously show how our agents behave (using tricky ideas like Löbian proofs of cooperation), I’m looking for something more than qualitative philosophical ideas. It turns out that a lot of the philosophical reasoning I was using (about why these agents would achieve mutual cooperation) made use of outside-the-system conclusions that the agents cannot rely on without self-contradiction, but I didn’t notice this until I tried to be more rigorous. The same applies to what you’re suggesting.
Note also that by de-formalizing how your agent works, you make it harder to verify that it actually does what it is philosophically “supposed” to do.
I’m glad you’re thinking about this sort of decision theory too, but I’m letting you know that here there be dragons, and you don’t see them until you formalize things.
Assuming the existence of an interpreter that can run the opponent’s code (which you seem to be assuming as well), there’s not a lot of logic here to formalize.
The dragons only arise when you have two non-trivial processes operating against each other. The issue with your formalized logic arises when it recurses; when the two agents are playing a “I know what you’re thinking.” “I know that you know what I’m thinking.” “I know that you know that I know what you’re thinking.” game. As long as one of the two agents uses a trivialized process, you can avoid this situation.
By trivialized process, I mean non-self referential.
In formalized pseudocode, where f(x, y) returns “Cooperate” or “Defect”, x is the code to be tested (your opponent’s code), and y is the code to test with:
It’s relatively simple to see that it cooperates against a CooperateBot, which it uses as its signal; agents which cooperate with CooperateBot are considered “Trustworthy.” Because it cooperates with CooperateBot, it will cooperate with itself.
Now, you may say that it is throwing away free points by cooperating with CooperateBot. This is true in some sense, but in another, it is avoiding extensive computation by using this methodology, and making its behavior predictable; these lost points are the cost paid for these advantages. If computation costs utility, the relative cheapness of this behavior is important.
The penalty it pays for this behavior is that it cannot simultaneously guarantee that its opponent will defect against a DefectBot; that would introduce an inconsistency. This -appears- to be the issue with your original agent. To potentially resolve that issue, a somewhat better mechanism might be comparing behavior against the next iteration of moves made by a Tit For Tat agent against a cooperative and a defecting bot, which would reward vindictive but trustworthy agents. (A necessary but not sufficient element to avoid inconsistency is that the agent being compared against is not reflective at all—that is, doesn’t examine its opponent’s source code.) However, I am having trouble proving out that that behavior wouldn’t also be inconsistent.
ETA: Why do I claim that self-referential (non-trivial) agents aren’t consistent, when opposing each other? For the following reason: Agent A runs f(x,this) on Agent B; agent B runs f(x,this) on Agent A. This becomes f(f(f(f(f(f(f(..., this) - that is, it’s infinite recursion. Masquerade solves this by trying out a variety of different “masks”—which makes the reflective agent a subset of Masquerade, using only a single mask. The issue is that Masquerade is designed to deal with other reflective agents; the two agents would mutually defect if Masquerade defects against CooperateBot in the original formulation, which is to say, the addition of masks can make Masquerade -less effective- depending upon the composition of its opponent. Whatever signal different reflective agents use, Masquerade is likely to break, given a sufficiently large number of masks.
Now, you may say that it is throwing away free points by cooperating with CooperateBot.
Indeed, I do say this. I’m looking to formalize an agent that does the obviously correct things against CooperateBot, DefectBot, FairBot, and itself, without being exploitable by any opponent (i.e. the opponent cannot do better than a certain Nash equilibrium unless the agent does better too). Anything weaker than that simply doesn’t interest me.
One reason to care about performance against CooperateBot is that playing correctly against constant strategies is equivalent to winning at one-player games. Rational agents should win, especially if there are no other agents in the picture!
“Always defect” meets your criteria. You missed a criteria: That it wouldn’t miss out on a cooperation it could achieve if its strategy were different.
Your agent will have to consider, not only its current opponent, but every opponent currently in the game. (If Masquerade played in a larger game full of my reflective agents, it would consistently -lose-, because it would choose a mask it thinks they will cooperate against, and they will find the mask it would use against CooperateBot and believe it would defect against them.)
Therefore, in a game full of my reflective agents and one CooperateBot, your agent should choose to cooperate with CooperateBot, because otherwise my reflective agents will always defect against it.
I said that it needed to cooperate with FairBot and with itself (and not for gerrymandered reasons the way that CliqueBots do).
You missed a criteria: That it wouldn’t miss out on a cooperation it could achieve if its strategy were different.
This is impossible to guarantee without thereby being exploitable. Say that there are agents which cooperate iff their opponent cooperates with DefectBot.
Part of the trick is figuring out exactly what kind of optimality we’re looking for, and I don’t have a good definition, but I tend to think of agents like the ones you defined as “not really trying to win according to their stipulated payoff matrix”, and so I’m less worried about optimizing my strategy for them. But those sort of considerations do factor into UDT thinking, which adds entire other layers of confusion.
Remember how I said that my idea made qualitative sense until I tried to show rigorously that it would do the right thing?
That strategy isn’t identical to mine- it’s similar to “Not Quite Timeless Decision Theory”, except less well formalized. Consider the first two Problems in that post, and whether they apply to that strategy too.
That’s what I thought you were referring to here.
It’s less formalized for a reason, though; the strategy isn’t perfection. It asserts a particular proposition and attempts to defend it. (Hence the use of the term “Rationalization” as opposed to prediction.) It also does not engage in recursive logic; it evaluates only what its opponent will do against a specific strategy, which is “Always cooperate.” If the opponent will cooperate against that strategy, it is assumed to be safe. If the opponent defects against an always-cooperator, it defects as well.
This rationalization engine doesn’t seek to do better than all other agents, it only seeks to do as well as every agent it comes up against. There is a specific counterstrategy which defeats it—cooperate with always-cooperators, defect against opponents who cooperate if you cooperate with always-cooperators. Introduce a “Falsified prediction” mechanism, by which it then always defects against opponents it falsely predicted would cooperate in the past, however, and it becomes an effective blackmail agent against every other agent in iterated games; such a counterstrategy becomes more expensive than simply cooperating.
It has the advantage of being extremely predictable; it will predict itself to cooperate against an always-cooperator, and because it doesn’t recurse, as long as P is remotely good at prediction, it will predict it to one-box in Newcomb’s Problem.
(Fundamentally the issues raised are whether an agent can be stupid enough to be predictable while being smart enough to accurately predict. The trick to this is that you can’t be afraid of getting the wrong answer; it is deliberately less formalized, and deliberately does not attempt to prove its solutions are ideal.)
ETA: The purpose of this agent is to exploit signaling; it observes what other agents would do to a helpless agent, suspecting that other agents may be nastier than it is, and that it is similarly helpless. It then behaves based on that information. It is in effect a reflective agent; it does to the other agent what the other agent would do to a helpless agent. It would calculate using the same probabilities other agents would, as well; against the 99%-cooperate timeless agent mentioned by Vaniver, it has a 99% chance of cooperating itself. (Although if it predicts wrong it then begins always defecting; such strategies would be risky against the reflective vindictive agent.)
Since the impasse only comes up when we try and rigorously show how our agents behave (using tricky ideas like Löbian proofs of cooperation), I’m looking for something more than qualitative philosophical ideas. It turns out that a lot of the philosophical reasoning I was using (about why these agents would achieve mutual cooperation) made use of outside-the-system conclusions that the agents cannot rely on without self-contradiction, but I didn’t notice this until I tried to be more rigorous. The same applies to what you’re suggesting.
Note also that by de-formalizing how your agent works, you make it harder to verify that it actually does what it is philosophically “supposed” to do.
I’m glad you’re thinking about this sort of decision theory too, but I’m letting you know that here there be dragons, and you don’t see them until you formalize things.
Assuming the existence of an interpreter that can run the opponent’s code (which you seem to be assuming as well), there’s not a lot of logic here to formalize.
The dragons only arise when you have two non-trivial processes operating against each other. The issue with your formalized logic arises when it recurses; when the two agents are playing a “I know what you’re thinking.” “I know that you know what I’m thinking.” “I know that you know that I know what you’re thinking.” game. As long as one of the two agents uses a trivialized process, you can avoid this situation.
By trivialized process, I mean non-self referential.
In formalized pseudocode, where f(x, y) returns “Cooperate” or “Defect”, x is the code to be tested (your opponent’s code), and y is the code to test with:
y: Cooperate();
main: if(f(x, y)) = Cooperate { Cooperate(); } else { Defect(); }
That would be the whole of the simple reflective agent; the vindictive agent would look thus:
boolean hasFailed=false;
y: Cooperate();
main: if(f(x, y)) = Cooperate and !hasFailed { Cooperate(); } else { Defect(); }
onHasCooperatedWithDefector: hasFailed = true;
It’s relatively simple to see that it cooperates against a CooperateBot, which it uses as its signal; agents which cooperate with CooperateBot are considered “Trustworthy.” Because it cooperates with CooperateBot, it will cooperate with itself.
Now, you may say that it is throwing away free points by cooperating with CooperateBot. This is true in some sense, but in another, it is avoiding extensive computation by using this methodology, and making its behavior predictable; these lost points are the cost paid for these advantages. If computation costs utility, the relative cheapness of this behavior is important.
The penalty it pays for this behavior is that it cannot simultaneously guarantee that its opponent will defect against a DefectBot; that would introduce an inconsistency. This -appears- to be the issue with your original agent. To potentially resolve that issue, a somewhat better mechanism might be comparing behavior against the next iteration of moves made by a Tit For Tat agent against a cooperative and a defecting bot, which would reward vindictive but trustworthy agents. (A necessary but not sufficient element to avoid inconsistency is that the agent being compared against is not reflective at all—that is, doesn’t examine its opponent’s source code.) However, I am having trouble proving out that that behavior wouldn’t also be inconsistent.
ETA: Why do I claim that self-referential (non-trivial) agents aren’t consistent, when opposing each other? For the following reason: Agent A runs f(x,this) on Agent B; agent B runs f(x,this) on Agent A. This becomes f(f(f(f(f(f(f(..., this) - that is, it’s infinite recursion. Masquerade solves this by trying out a variety of different “masks”—which makes the reflective agent a subset of Masquerade, using only a single mask. The issue is that Masquerade is designed to deal with other reflective agents; the two agents would mutually defect if Masquerade defects against CooperateBot in the original formulation, which is to say, the addition of masks can make Masquerade -less effective- depending upon the composition of its opponent. Whatever signal different reflective agents use, Masquerade is likely to break, given a sufficiently large number of masks.
Indeed, I do say this. I’m looking to formalize an agent that does the obviously correct things against CooperateBot, DefectBot, FairBot, and itself, without being exploitable by any opponent (i.e. the opponent cannot do better than a certain Nash equilibrium unless the agent does better too). Anything weaker than that simply doesn’t interest me.
One reason to care about performance against CooperateBot is that playing correctly against constant strategies is equivalent to winning at one-player games. Rational agents should win, especially if there are no other agents in the picture!
“Always defect” meets your criteria. You missed a criteria: That it wouldn’t miss out on a cooperation it could achieve if its strategy were different.
Your agent will have to consider, not only its current opponent, but every opponent currently in the game. (If Masquerade played in a larger game full of my reflective agents, it would consistently -lose-, because it would choose a mask it thinks they will cooperate against, and they will find the mask it would use against CooperateBot and believe it would defect against them.)
Therefore, in a game full of my reflective agents and one CooperateBot, your agent should choose to cooperate with CooperateBot, because otherwise my reflective agents will always defect against it.
I said that it needed to cooperate with FairBot and with itself (and not for gerrymandered reasons the way that CliqueBots do).
This is impossible to guarantee without thereby being exploitable. Say that there are agents which cooperate iff their opponent cooperates with DefectBot.
Part of the trick is figuring out exactly what kind of optimality we’re looking for, and I don’t have a good definition, but I tend to think of agents like the ones you defined as “not really trying to win according to their stipulated payoff matrix”, and so I’m less worried about optimizing my strategy for them. But those sort of considerations do factor into UDT thinking, which adds entire other layers of confusion.