I’m having trouble comprehending what running one of these bots would consist of. Suppose MyBot is put against OpponentBot. So MyBot is going to run a simulation of OpponentBot, and that instance of OpponentBot is going to run a simulation of MyBot, and that instance is going to run a simulation of OpponentBot … how does this process terminate? My head hurts.
I would guess that in that case, you run out of time and get penalized. Which suggests a strategy: Program your own bot to wait until the last possible millisecond before transmitting its choice.
What do you mean, “in that case”? My issue is that I don’t see how two bots can both simulate each other. I don’t see what I presented as a special case, but the general case.
What do you mean, “in that case”? My issue is that I don’t see how two bots can both simulate each other. I don’t see what I presented as a special case, but the general case.
Just because your bot has the option of simulating its opponent doesn’t mean that you would necessarily program it to do so. Evidently there is a time limit in the game so if your bot went overboard simulating it would end up running out of time and being penalized a lot of the time.
For example one possible strategy is “transparent tit for tat,” i.e. program your bot not to simulate anything at all but to simply play tit for tat. With the hope that other bots will simulate yours and figure out that the best strategy is repeated cooperation.
Transparent tit-for-tat is a decent enough strategy, but it is easily beaten 1-on-1 by “cooperate on every turn except the very last one”, and against many other simple bots by “TFT but defect on the last turn”.
That being said, when the game is 100 turns long, it’s pretty hard to do significantly better than Tit-for-Tat—the aforementioned strategies only beat TFT with a score of 302-297.
I think that’s probably a good reason to shorten the round length from 100 down to, say, 10 or 20 - otherwise the variance from a single RandomBot would drown out that score difference.
Transparent tit-for-tat is a decent enough strategy, but it is easily beaten 1-on-1 by “cooperate on every turn except the very last one”, and against many other simple bots by “TFT but defect on the last turn”.
I agree, I was simply giving an example of a strategy which would not lead to the sort of infinite regress envisioned by ThisSpaceAvailable.
I do think it’s worth noting that “TFT but defect on the last turn” would be beaten 1-on-1 by “TFT but defect on the last two turns.” And both of them would be beaten 1-on-1 by “TFT but defect on the last 50 turns.”
In fact, in a 1-on-1 game, it seems to me the best strategy would seem to be to always defect and hope the other fellow is stupid enough to cooperate at least once. Actually it seems pretty much pointless to talk about 1-on-1 prisoner’s dilemma, because it turns things back into a zero sum game.
If you think about it as win/lose, then yes, of course. However, it’s still instructive to ask the question “what’s the best response strategy”? In other words, “what strategy will maximise my utility against this given, fixed opponent strategy?”.
In that sense, the best response to TFT is indeed “TFT but defect on the last turn”, because “TFT but defect on the last two turns” does strictly worse when playing against TFT.
Fortunately, you can indeed do significantly better than TFT in this game!
In that sense, the best response to TFT is indeed “TFT but defect on the last turn”, because “TFT but defect on the last two turns” does strictly worse when playing against TFT.
But what if you are playing against “TFT but defect on the last turn”? In that case, the best strategy is TFT-2. And if the other fellow is playing TFT-2, the best strategy is TFT-3. And so on. Agreed?
Fortunately, you can indeed do significantly better than TFT in this game!
Seems to me it depends on what strategies the other contestants are playing.
Oh, yes, of course; the best response to TFT-1 is clearly TFT-2, and so on.
As for how well strategies do, while it’s clear that it will depend on the strategies of other contestants and in that sense there cannot be a “best strategy”, I think one can do better—for example, if there’s a Nash Equilibrium strategy that isn’t simply (Defect, Defect).
At a bare minimum, you can improve upon TFT by also making sure it defects against CooperateBots, and doesn’t wait until the second turn to defect against DefectBots. Of course, there may indeed be JusticeBots out there who punish you for defecting against CooperateBots...
At a bare minimum, you can improve upon TFT by also making sure it defects against CooperateBots,
That’s assuming that the new algorithm can correctly identify its opponents. Also, if other algorithms correctly sense that yours is opportunistic, they might change their strategy to your detriment.
In that case the game becomes significantly less interesting, because there’s far less you can do to improve on TFT and similar strategies, because TFT is a Nash Equilibrium. Granted, you can still include provisions to detect against RandomBots, CooperateBots, and DefectBots, but the finitely repeated PD is significantly more interesting.
“Interesting” is in the eye of the beholder. Not knowing the exact number of rounds adds complexity. I doubt that adding complexity means “there’s far less you can do to improve”.
(1) Saying “complexity” is not an argument. (2) The unknown horizon case is actually strictly less complex, because now each round is the same w.r.t. the future, whereas for the finite horizon case each round has a distinctly different future. (3) You haven’t countered the point that TFT is a Nash Equilibrium in the infinite-horizon case—that makes it fairly trivial and obvious. (4) http://lesswrong.com/lw/to/the_truly_iterated_prisoners_dilemma/
(1) It’s a about the same kind of argument as saying “less interesting” :-P
(2) No, each round is not the same because the horizon is not unknown, it’s just that instead of a fixed number you have a probability distribution to deal with.
(3) We are not talking about the infinite-horizon case.
My main point is that the NE for the fixed-length IPD is “always defect”, but the fact that we really ought to be able to do much better than “always defect” is what makes that case particularly interesting.
(2 & 3) Sorry; I misunderstood. was thinking about the infinite-horizon case. If you have a probability distribution over possible lengths of the game then the problem is indeed more complex, but I don’t see that much benefit to it—it really doesn’t change things all that much.
In particular, if you still have a limit on the number of rounds, then the same reasoning by backwards induction still applies (i.e .the optimal strategy is to always defect), and so the same interesting aspect of the problem is still there.
Similarly, the optimal counter-strategy to TFT stays mostly the same. It will simply be “TFT, but always defect starting from round N” where N is some number that will depend on the specific probabilities in question.
The interesting aspect of the problem is still the part that comes from the finite limit, regardless of whether it’s fixed or has some kind of distribution over a finite number of possibilities.
then the same reasoning by backwards induction still applies (i.e .the optimal strategy is to always defect)
Not so fast.
Once a prisoner condemned to death was brought before the king to set the date of the execution. But the king was in a good mood, having just had a tasty breakfast, and so he said: “You will be executed next week, but to spare you the agony of counting down the hours of your life, I promise you that you will not know the day of your execution until the jailers come to take you to the gallows”.
The prisoner was brought back to his cell where he thought for a while and then exclaimed: “But I cannot be executed if the king is to keep his word! I cannot be executed on Sunday because if I’m alive on Saturday I’ll know the day of my execution and that breaks the king’s promise. And I cannot be executed on Saturday because I know I cannot be executed on Sunday so if Friday comes around and I’m still alive, I’ll know I have to be executed on Saturday. Etc., etc. This perfect reasoning by backwards induction says I cannot be executed during any day. And since the king always keeps his word, I’m safe!”. Much relieved, he lay down on his cot whistling merrily.
And so, when on Tuesday the guards came for him, he was very surprised. The king kept his word.
Sorry; I meant to say the “optimal” strategy is to defect. I don’t agree with the backwards induction; my point was that that argument is precisely what makes the problem interesting.
EDIT: By the way, I’m pretty sure I’ve come up with a strategy that is a Nash equilibrium (in IPD + simulations) and always cooperates with itself, so I very strongly agree that always defecting is not optimal.
Each bot can run a simulation of the other if the circumstances under which they run their opponents are not the same.
For example, JusticeBot does indeed run a simulation of its opponent, but this is not a problem because JusticeBot simulates what the opponent will do against CooperateBot, not against JusticeBot.
Similarly, an agent could run a simulation of its opponent against itself, but with a different history.
If you run out of time, you defect. If your opponent simulates you without a time limit, and you take long enough to cause ver to run out of time, ve’ll defect, which isn’t what you want.
But there’s a function that lets you run a function for up to a specified number of microseconds, and tells you what it returns or whether it times out.
If you run out of time, you defect. If your opponent simulates you without a time limit, and you take long enough to cause [him] to run out of time, [he’ll] defect, which isn’t what you want.
I’m not sure about that—the original post says this:
the penalty for not outputting Cooperate or Defect within the time limit has been reduced.
So presumably most bots will be set up to make a choice within the time limit.
But there’s a function that lets you run a function for up to a specified number of microseconds, and tells you what it returns or whether it times out.
Right, so the practical effect of the strategy I proposed would be to deny opponents knowledge. Of course, one can envision situations where you want to be transparent.
I would guess that in that case, you run out of time and get penalized. Which suggests a strategy: Program your own bot to wait until the last possible millisecond before transmitting its choice.
What do you mean, “in that case”? My issue is that I don’t see how two bots can both simulate each other. I don’t see what I presented as a special case, but the general case.
Just because your bot has the option of simulating its opponent doesn’t mean that you would necessarily program it to do so. Evidently there is a time limit in the game so if your bot went overboard simulating it would end up running out of time and being penalized a lot of the time.
For example one possible strategy is “transparent tit for tat,” i.e. program your bot not to simulate anything at all but to simply play tit for tat. With the hope that other bots will simulate yours and figure out that the best strategy is repeated cooperation.
Transparent tit-for-tat is a decent enough strategy, but it is easily beaten 1-on-1 by “cooperate on every turn except the very last one”, and against many other simple bots by “TFT but defect on the last turn”.
That being said, when the game is 100 turns long, it’s pretty hard to do significantly better than Tit-for-Tat—the aforementioned strategies only beat TFT with a score of 302-297.
I think that’s probably a good reason to shorten the round length from 100 down to, say, 10 or 20 - otherwise the variance from a single RandomBot would drown out that score difference.
I agree, I was simply giving an example of a strategy which would not lead to the sort of infinite regress envisioned by ThisSpaceAvailable.
I do think it’s worth noting that “TFT but defect on the last turn” would be beaten 1-on-1 by “TFT but defect on the last two turns.” And both of them would be beaten 1-on-1 by “TFT but defect on the last 50 turns.”
In fact, in a 1-on-1 game, it seems to me the best strategy would seem to be to always defect and hope the other fellow is stupid enough to cooperate at least once. Actually it seems pretty much pointless to talk about 1-on-1 prisoner’s dilemma, because it turns things back into a zero sum game.
If you think about it as win/lose, then yes, of course. However, it’s still instructive to ask the question “what’s the best response strategy”? In other words, “what strategy will maximise my utility against this given, fixed opponent strategy?”.
In that sense, the best response to TFT is indeed “TFT but defect on the last turn”, because “TFT but defect on the last two turns” does strictly worse when playing against TFT.
Fortunately, you can indeed do significantly better than TFT in this game!
But what if you are playing against “TFT but defect on the last turn”? In that case, the best strategy is TFT-2. And if the other fellow is playing TFT-2, the best strategy is TFT-3. And so on. Agreed?
Seems to me it depends on what strategies the other contestants are playing.
Oh, yes, of course; the best response to TFT-1 is clearly TFT-2, and so on.
As for how well strategies do, while it’s clear that it will depend on the strategies of other contestants and in that sense there cannot be a “best strategy”, I think one can do better—for example, if there’s a Nash Equilibrium strategy that isn’t simply (Defect, Defect).
At a bare minimum, you can improve upon TFT by also making sure it defects against CooperateBots, and doesn’t wait until the second turn to defect against DefectBots. Of course, there may indeed be JusticeBots out there who punish you for defecting against CooperateBots...
That’s assuming that the new algorithm can correctly identify its opponents. Also, if other algorithms correctly sense that yours is opportunistic, they might change their strategy to your detriment.
That’s fixable by not letting the bot know how many round are there other than, say, more than 20 and less than a hundred.
In that case the game becomes significantly less interesting, because there’s far less you can do to improve on TFT and similar strategies, because TFT is a Nash Equilibrium. Granted, you can still include provisions to detect against RandomBots, CooperateBots, and DefectBots, but the finitely repeated PD is significantly more interesting.
“Interesting” is in the eye of the beholder. Not knowing the exact number of rounds adds complexity. I doubt that adding complexity means “there’s far less you can do to improve”.
(1) Saying “complexity” is not an argument.
(2) The unknown horizon case is actually strictly less complex, because now each round is the same w.r.t. the future, whereas for the finite horizon case each round has a distinctly different future.
(3) You haven’t countered the point that TFT is a Nash Equilibrium in the infinite-horizon case—that makes it fairly trivial and obvious.
(4) http://lesswrong.com/lw/to/the_truly_iterated_prisoners_dilemma/
(1) It’s a about the same kind of argument as saying “less interesting” :-P
(2) No, each round is not the same because the horizon is not unknown, it’s just that instead of a fixed number you have a probability distribution to deal with.
(3) We are not talking about the infinite-horizon case.
(4) Yes, and?
My main point is that the NE for the fixed-length IPD is “always defect”, but the fact that we really ought to be able to do much better than “always defect” is what makes that case particularly interesting.
(2 & 3) Sorry; I misunderstood. was thinking about the infinite-horizon case. If you have a probability distribution over possible lengths of the game then the problem is indeed more complex, but I don’t see that much benefit to it—it really doesn’t change things all that much.
In particular, if you still have a limit on the number of rounds, then the same reasoning by backwards induction still applies (i.e .the optimal strategy is to always defect), and so the same interesting aspect of the problem is still there.
Similarly, the optimal counter-strategy to TFT stays mostly the same. It will simply be “TFT, but always defect starting from round N” where N is some number that will depend on the specific probabilities in question.
The interesting aspect of the problem is still the part that comes from the finite limit, regardless of whether it’s fixed or has some kind of distribution over a finite number of possibilities.
Not so fast.
Once a prisoner condemned to death was brought before the king to set the date of the execution. But the king was in a good mood, having just had a tasty breakfast, and so he said: “You will be executed next week, but to spare you the agony of counting down the hours of your life, I promise you that you will not know the day of your execution until the jailers come to take you to the gallows”.
The prisoner was brought back to his cell where he thought for a while and then exclaimed: “But I cannot be executed if the king is to keep his word! I cannot be executed on Sunday because if I’m alive on Saturday I’ll know the day of my execution and that breaks the king’s promise. And I cannot be executed on Saturday because I know I cannot be executed on Sunday so if Friday comes around and I’m still alive, I’ll know I have to be executed on Saturday. Etc., etc. This perfect reasoning by backwards induction says I cannot be executed during any day. And since the king always keeps his word, I’m safe!”. Much relieved, he lay down on his cot whistling merrily.
And so, when on Tuesday the guards came for him, he was very surprised. The king kept his word.
Sorry; I meant to say the “optimal” strategy is to defect. I don’t agree with the backwards induction; my point was that that argument is precisely what makes the problem interesting.
EDIT: By the way, I’m pretty sure I’ve come up with a strategy that is a Nash equilibrium (in IPD + simulations) and always cooperates with itself, so I very strongly agree that always defecting is not optimal.
Each bot can run a simulation of the other if the circumstances under which they run their opponents are not the same.
For example, JusticeBot does indeed run a simulation of its opponent, but this is not a problem because JusticeBot simulates what the opponent will do against CooperateBot, not against JusticeBot.
Similarly, an agent could run a simulation of its opponent against itself, but with a different history.
But that different history would have to result in not running a further simulation.
Not immediately, but yes, the recursion would eventually need to hit a base case.
If you run out of time, you defect. If your opponent simulates you without a time limit, and you take long enough to cause ver to run out of time, ve’ll defect, which isn’t what you want.
But there’s a function that lets you run a function for up to a specified number of microseconds, and tells you what it returns or whether it times out.
I’m not sure about that—the original post says this:
So presumably most bots will be set up to make a choice within the time limit.
Right, so the practical effect of the strategy I proposed would be to deny opponents knowledge. Of course, one can envision situations where you want to be transparent.