Then this game actually seems to be about getting strategy data out of reputation data. The zeroth-level strategy is to defect very time, because that’s the nash equilibrium. An example of a first-level strategy is to cooperate against players with reputation higher than some cutoff, and defect against everyone else, because you want to coooperate against cooperaters to outlast the slackers. A second level strategy models the other players modeling you, so you might for example cooperate against players with reputation higher than some function of your own reputation.
So I think the way to go in an idealized version of this game is to model other players as simple third-level players and try to predict whether they’ll cooperate with you, based on reputation and the past reputation matrices, and picking a strategy to maximize food over future rounds, which means keeping a reputation where the backlash from slacking has yet to outweigh the food gained form it.
But there are some non-idealities. As the game goes on, reputation varies more slowly, so you can actually start playing tit-for-tat strategies—and so can other people. Identifiability, and most of the slackers getting eliminated, increase the costs for defecting. Of course, you can test whether people have identified you or not by defecting against unknown-strategy players occasionally and seeing if they tit for tat you back. If they don’t, then you can model them as third level players and not have to treat them as equals.
Dealing with the random factor m is a lot like the iterated prisoner’s dilemma—you cooperate the first time or two, then can go to tit for tat. Which is to say, you start out by being more likely to cooperate when m will take cooperation to reach. Bu if other players don’t reciprocate enough to make it worthwhile, you just ignore m. It’s good for cleverer players if the goals are reached, though, because it keeps the slackers and exploitable players in the game, which is a source of relative advantage for cleverer players.
What do you mean by “third level players” here? You call “second-level strategy” already modeling other players, but
you can model them as third level players and not have to treat them as equals.
… makes me think you maybe meant first-level players?
Anyway, I would model most of my opponents as what you call first-level strategy, with some varying parameters (and extra randomness). And possibly include as part of the population whichever more exotic strategies can systematically beat (various mixes of) those.
Then this game actually seems to be about getting strategy data out of reputation data. The zeroth-level strategy is to defect very time, because that’s the nash equilibrium. An example of a first-level strategy is to cooperate against players with reputation higher than some cutoff, and defect against everyone else, because you want to coooperate against cooperaters to outlast the slackers. A second level strategy models the other players modeling you, so you might for example cooperate against players with reputation higher than some function of your own reputation.
So I think the way to go in an idealized version of this game is to model other players as simple third-level players and try to predict whether they’ll cooperate with you, based on reputation and the past reputation matrices, and picking a strategy to maximize food over future rounds, which means keeping a reputation where the backlash from slacking has yet to outweigh the food gained form it.
But there are some non-idealities. As the game goes on, reputation varies more slowly, so you can actually start playing tit-for-tat strategies—and so can other people. Identifiability, and most of the slackers getting eliminated, increase the costs for defecting. Of course, you can test whether people have identified you or not by defecting against unknown-strategy players occasionally and seeing if they tit for tat you back. If they don’t, then you can model them as third level players and not have to treat them as equals.
Dealing with the random factor m is a lot like the iterated prisoner’s dilemma—you cooperate the first time or two, then can go to tit for tat. Which is to say, you start out by being more likely to cooperate when m will take cooperation to reach. Bu if other players don’t reciprocate enough to make it worthwhile, you just ignore m. It’s good for cleverer players if the goals are reached, though, because it keeps the slackers and exploitable players in the game, which is a source of relative advantage for cleverer players.
What do you mean by “third level players” here? You call “second-level strategy” already modeling other players, but
… makes me think you maybe meant first-level players?
Anyway, I would model most of my opponents as what you call first-level strategy, with some varying parameters (and extra randomness). And possibly include as part of the population whichever more exotic strategies can systematically beat (various mixes of) those.