Game Theory of the Immortals
I’m sure many others have put much more thought into this sort of thing—at the moment, I’m too lazy to look for it, but if anyone has a link, I’d love to check it out.
Anyway, I ran into some interesting musings on game theory for immortal agents and I thought it was interesting enough to talk about.
Cooperation in games like the iterated Prisoner’s Dilemma is partly dependent on the probability of encountering the other player again. Axelrod (1981) gives the payoff for a sequence of ‘cooperate’s as R/(1-p) where R is the payoff for cooperating, and p is a discount parameter that he takes as the probability of the players meeting again (and recognizing each other, etc.). If you assume that both players continue playing for eternity in a randomly mixing, finite group of other players, then the probability of encountering the other player again approaches 1, and the payoff for an extended period of cooperation approaches infinity.
So, take a group of rational, immortal agents, in a prisoner’s dilemma game. Should we expect them to cooperate?
I realize there is no optimal strategy without reference to the other players’ strategies, and that the universe is not actually infinite in time, so this is not a perfect model on at least two counts, but I wanted to look at the simple case before adding complexities.
The discount factor can mess things up—you’ll meet someone again, but after how long?
I’m not sure I see your point. My reasoning was that if you meet the same person on average every thousand games in an infinite series of games, you’ll end up meeting them an infinite number of times. Am I confusing the sample space with the event space?
If you have a strong discount factor, then even if you meet the same person infinitely often, your gain is still bounded above (summing a geometric series), and can be much smaller than winning your current round.
face-palm Ah yes. Thanks.
How can R/(1-p) diminish when R and p are constant? Are you discounting future games as worth less than this game, and is that consistent with the scoring of iterated prisoner’s dilemma?
Yes, that’s what discounting does. If you have a discounted iterated PD, you have to do something like that. And it R/(1-p) is smaller than profiteering in your current interaction, you’ll profiteer in your current action.
Is that consistent with the scoring of iterated prisoners’ dilemma, or is it a different game? The goal of abstract games is to maximize one’s score at the end of the game (or in infinite games, maximize the average score per time across infinite time)
The expected score of a discounting defector with per-round discount fraction p versus a cooperate-then reciprocate player in the [3,4;1,2] matrix after n-1 rounds would be 4+sum_{r=1}{n}2rp.The expected score of a cooperate-then reciprocate player against the same opponent would be 3+sum_{r=1}{n}3rp.
A quick estimate says that for a p of .5, the two scores are the same over infinite time.
It is, for the reasons you suggest, a different game.
If there’s some way they might not meet again (e.g. kill the other player, imprison the other player), or there’s strong compound interest on advantages (the inverse of the discount factor Stuart Armstrong mentions) they might want to defect. Or if they have imperfect memories and might forget about the encounter by the time the next one happens.
Or if you have an imperfect memory and you think they don’t...
If you have an imperfect memory and you think they don’t, wouldn’t you want to pre-commit to attempting co-operation with any immortal entities you face, given they are very likely to remember you, even if you don’t remember them? This is of course assuming that most or all other immortal entities you’re likely to face in the Dilemma do in fact have perfect memories.
If you can’t remember, and they can work that out, then they can defect on you every time and get more points, at no penalty other than making you less and less optimistic about cooperation with rarely-encountered entities.
That could eventually cut into their profits, but it becomes a tragedy of the commons, with you being the commons.
You’re right.
In this case, assuming immortals had perfect memories and would eventually work out that you didn’t, assuming you were an immortal who can’t remember if you’ve played that particular opponent before (But can vaguely remember an idea of how often you get defected on vs. co-operated with by the entire field) what do you think your optimal strategy would be?
It’s pretty complicated! I think you’d need to write down equations to figure it out properly and it would be very non-trivial. That said, assuming there’s only games and no communication, you probably want to start off cooperating or randomising between cooperation and defection (depending on pay-offs), and then shift to more and more defection over time until you always defect in the end. Meanwhile, the immortals with memories would probably want to start off mostly cooperating but sometimes defecting to figure out who, if anyone, doesn’t have memories. (Disclaimer: while this sounds like a plausible set of equilibrium strategies, there may be more complicated equilibria that I didn’t think of or some other weird cases.)