Your strategy should be to defect unless you have good enough evidence that cooperating will cause Omega to also cooperate.
Omega will predict this, and will give you good enough evidence. Whether or not this actually leads to Omega cooperating depends on the strength of evidence given. Money placed in escrow would be enough.
Of course, if you can’t receive any communication from Omega before placing your decision, Omega is going to defect on you (since Omega’s decisions can’t affect what you do). This is still assuming it’s the only prisoner’s dilemma Omega plays in your light cone, however.
Unless otherwise stated, the games theory game of Prisoner’s Dilemma takes place as the only event in the hypothetical universe; in this example, prior communication and credible precommitment are not permitted.
Instead of generating a strategy to use against a copy of yourself, consider what the best strategy would be against the optimized player who knows what your strategy is.
TDT permits acting as though one had precommitted, the result being that one never wishes one’s opportunities to precommit were different. Consider a perfectly reasoning person and a perfectly reasoning Omega with the added ability to know what the person’s move will be before making its own move (and not mind-reading; only knowing the move).
If the human knows TDT is optimal, then he knows Omega will use it; if the human knows that TDT would make the true but non-credible precommittment, then the human knows that Omega has chosen the optimal precommitment.
If the ideal silent precommitment strategy is the diagonal, then we get C-C as the result. If any other precommitment is ideal, then Omega would do no worse than 3 points using it against a perfectly rational non-precog.
If the human is a cooperate-bot, then the ideal strategy is to defect. Therefore committing to the diagonal is suboptimal, because it results in 3 instead of 5 points in that one case. However, the human here is either going to cooperate or defect without regard to Omega’s actual strategy (only the ideal strategy), meaning that the human is choosing between 0 and 1 if the ideal strategy is defectbot.
Either there’s a potential precommit that I haven’t considered, TDT is not optimal in this context, or I’ve missed something else. Evidence that I’ve missed something would be really nice.
… My line of thought is unchanged if Omega simply learns your decision after you decide but before Omega decides. The game is now not symmetrical.
Currently, I have concluded that if it is best to cooperate if the other player has cooperated, it is best for the first player to cooperate against a rational opponent. (3 instead of 1). However, it is better to cooperate with >1/3 chance, and that still provides a higher expected result to the first player.
If, given the choice between 3 points and 5 points, 5 points is better, then it is best for the first player to defect (1 instead of 0).
In the end, the first player has 2 possible strategies, and the second player has 4 possible strategies, for a total of 8 possibilities:
My problem is that if quid pro quo {c:C;d:D} is the optimum strategy, two optimal players end up cooperating. But quid pro quo is a strictly worse strategy than defectbot {c:D;d:D}. However, if defectbot is the best strategy for player 2, then the best strategy for player 1 is to defect; if quid pro quo is the best strategy for player 2, then the best strategy for player 1 is to cooperate.
I have trouble understanding how the optimal strategy can be strictly worse than a competing strategy.
IFF quid pro quo is optimal, then optimal players score 3 points each. However, iff quid pro quo is the optimal strategy, then defectbot scores more against an optimal player 1; the optimal player 1 strategy is to defect, and optimal players score 1 point each.
Please stop using the words “rational” and “optimal”, and give me some sign that you’ve read the linked post on counterfactuals rather than asking counterfactual questions whose assumptions you refuse to spell out.
The only difficult question here concerns the imbalance in knowledge between Omega and a human, per comment by shminux. Because of this, I don’t actually know what TDT does here (much less ‘rationality’).
Assumptions: The game uses the payout matrix described OP, and the second player learns of the first player’s move before making his move. Both players know that both players are trying to win and will not use a strategy which does not result in them winning.
My conclusion is that both players defect. My problem is that it would be better for player 2 if player 2 did not have the option to defect if player 1 cooperated.
I’ve thrown out cooperatebot and reverse quid pro quo as candidates for best strategy.
FYI: I’m using this as my reference, and this hinges on reflexive inconsistency. I can’t find a reflexively consistent strategy even with only two options available. (Note that defectbot consistently equals or outperforms quid pro quo in all cases)
Again, you don’t sound like you’ve read this post here. Let’s say that, in fact, “it would be better for player 2 if player 2 did not have the option to defect if player 1 cooperated”—though I’m not at all sure of that, when player 2 is Omega—and let’s say Omega uses TDT. Then it will ask counterfactual questions about what “would” happen if Omega’s own abstract decision procedure gave various answers. Because of the nature of the counterfactuals, these will screen off any actions by player 1 that depend on said answers, even ‘known’ actions.
You’re postulating away the hard part, namely the question of whether the human player’s actions depend on Omega’s real thought processes or if Omega can just fool us!
Which strategy is best does not depend on what any given agent decides the ideal strategy is.
I’m assuming only that both the human player and Omega are capable of considering a total of six strategies for a simple payoff matrix and determining which ones are best. In particular, I’m calling Löb’shit on the line of thought “If I can prove that it is best to cooperate, other actors will concur that it is best to cooperate” when used as part of the proof that cooperation is best.
I’m using TDT instead of CDT because I wish to refuse to allow precommitment to become necessary or beneficial, and CDT has trouble explaining why to one-box if the boxes are transparent.
Your strategy should be to defect unless you have good enough evidence that cooperating will cause Omega to also cooperate.
Omega will predict this, and will give you good enough evidence. Whether or not this actually leads to Omega cooperating depends on the strength of evidence given. Money placed in escrow would be enough.
Of course, if you can’t receive any communication from Omega before placing your decision, Omega is going to defect on you (since Omega’s decisions can’t affect what you do). This is still assuming it’s the only prisoner’s dilemma Omega plays in your light cone, however.
Unless otherwise stated, the games theory game of Prisoner’s Dilemma takes place as the only event in the hypothetical universe; in this example, prior communication and credible precommitment are not permitted.
Instead of generating a strategy to use against a copy of yourself, consider what the best strategy would be against the optimized player who knows what your strategy is.
TDT permits acting as though one had precommitted, the result being that one never wishes one’s opportunities to precommit were different. Consider a perfectly reasoning person and a perfectly reasoning Omega with the added ability to know what the person’s move will be before making its own move (and not mind-reading; only knowing the move).
If the human knows TDT is optimal, then he knows Omega will use it; if the human knows that TDT would make the true but non-credible precommittment, then the human knows that Omega has chosen the optimal precommitment.
If the ideal silent precommitment strategy is the diagonal, then we get C-C as the result. If any other precommitment is ideal, then Omega would do no worse than 3 points using it against a perfectly rational non-precog.
If the human is a cooperate-bot, then the ideal strategy is to defect. Therefore committing to the diagonal is suboptimal, because it results in 3 instead of 5 points in that one case. However, the human here is either going to cooperate or defect without regard to Omega’s actual strategy (only the ideal strategy), meaning that the human is choosing between 0 and 1 if the ideal strategy is defectbot.
Either there’s a potential precommit that I haven’t considered, TDT is not optimal in this context, or I’ve missed something else. Evidence that I’ve missed something would be really nice.
Either you haven’t read this, and are not talking about TDT as I know it, or I don’t understand you at all.
… My line of thought is unchanged if Omega simply learns your decision after you decide but before Omega decides. The game is now not symmetrical.
Currently, I have concluded that if it is best to cooperate if the other player has cooperated, it is best for the first player to cooperate against a rational opponent. (3 instead of 1). However, it is better to cooperate with >1/3 chance, and that still provides a higher expected result to the first player.
If, given the choice between 3 points and 5 points, 5 points is better, then it is best for the first player to defect (1 instead of 0).
In the end, the first player has 2 possible strategies, and the second player has 4 possible strategies, for a total of 8 possibilities:
My problem is that if quid pro quo {c:C;d:D} is the optimum strategy, two optimal players end up cooperating. But quid pro quo is a strictly worse strategy than defectbot {c:D;d:D}. However, if defectbot is the best strategy for player 2, then the best strategy for player 1 is to defect; if quid pro quo is the best strategy for player 2, then the best strategy for player 1 is to cooperate.
I have trouble understanding how the optimal strategy can be strictly worse than a competing strategy.
IFF quid pro quo is optimal, then optimal players score 3 points each.
However, iff quid pro quo is the optimal strategy, then defectbot scores more against an optimal player 1; the optimal player 1 strategy is to defect, and optimal players score 1 point each.
Please stop using the words “rational” and “optimal”, and give me some sign that you’ve read the linked post on counterfactuals rather than asking counterfactual questions whose assumptions you refuse to spell out.
The only difficult question here concerns the imbalance in knowledge between Omega and a human, per comment by shminux. Because of this, I don’t actually know what TDT does here (much less ‘rationality’).
Assumptions: The game uses the payout matrix described OP, and the second player learns of the first player’s move before making his move. Both players know that both players are trying to win and will not use a strategy which does not result in them winning.
My conclusion is that both players defect. My problem is that it would be better for player 2 if player 2 did not have the option to defect if player 1 cooperated.
I’ve thrown out cooperatebot and reverse quid pro quo as candidates for best strategy.
FYI: I’m using this as my reference, and this hinges on reflexive inconsistency. I can’t find a reflexively consistent strategy even with only two options available. (Note that defectbot consistently equals or outperforms quid pro quo in all cases)
Again, you don’t sound like you’ve read this post here. Let’s say that, in fact, “it would be better for player 2 if player 2 did not have the option to defect if player 1 cooperated”—though I’m not at all sure of that, when player 2 is Omega—and let’s say Omega uses TDT. Then it will ask counterfactual questions about what “would” happen if Omega’s own abstract decision procedure gave various answers. Because of the nature of the counterfactuals, these will screen off any actions by player 1 that depend on said answers, even ‘known’ actions.
You’re postulating away the hard part, namely the question of whether the human player’s actions depend on Omega’s real thought processes or if Omega can just fool us!
Which strategy is best does not depend on what any given agent decides the ideal strategy is.
I’m assuming only that both the human player and Omega are capable of considering a total of six strategies for a simple payoff matrix and determining which ones are best. In particular, I’m calling Löb’shit on the line of thought “If I can prove that it is best to cooperate, other actors will concur that it is best to cooperate” when used as part of the proof that cooperation is best.
I’m using TDT instead of CDT because I wish to refuse to allow precommitment to become necessary or beneficial, and CDT has trouble explaining why to one-box if the boxes are transparent.