Biggest thing that stood out to me watching this was that while the AI’s tactics seemed quite good, its game theory seemed quite poor—e.g. it wasn’t sufficiently vindictive if you betrayed it, which made it vulnerable to exploitation by a human aware of that fact.
I am doubtful about this. I am unsure whether Cicero will score higher if it is more vindictive, so I am hesitant to call its game theory poor. A good analogy is that I am hesitant to call AlphaGo’s endgame moves poor even if they look 100% poor, because I am not sure whether AlphaGo will win more games if it plays more human like endgame.
I’m an author on the paper. This is an interesting topic that I think we approached in roughly the right way. For context, some of my teammates and I did earlier research on AI for poker, so that concern for exploitability certainly carried over to our work on Diplomacy.
The setting that the human plays in the video (one human vs 6 known Cicero agents) is not the setting that we intended the agent to play in and is not the setting that we evaluate the agent. That’s simply a demonstration to get a sense of how the bot plays. If you want to evaluate the bot’s exploitability and game theory, it should be done in the setting we intended for evaluation.
The setting we intended the bot to play in is games where all players are anonymous, and there is a large pool of possible players. That means players don’t necessarily know which player is a bot, or whether there is a bot in that specific game at all. In that case, it’s reasonable for the human players to assume all other players might engage in retaliatory behavior, so the agent gets the benefit of a tit-for-tat reputation without having to actually demonstrate it.
The assumption that players are anonymous is explicitly accounted for in the algorithm. It’s the reason why we assume there is a common knowledge distribution over our lambda parameters for piKL while in fact we actually play according to a single low lambda. If you were to change that assumption, perhaps by having all players know that a specific player is a bot at the start of the game, then you should change the common knowledge distribution over lambda parameters to be that the bot will play according to the lambda it actually intends to play. In that case the agent will behave differently. Specifically, it will play a much more mixed, less exploitable policy.
It sounds like Cicero competes to win against other players who are trying to satisfy other human goals ingrained by evolution. Does not seem very fair.
Do we know to what extent top-rated players actually try to win in this anonymized no-stakes setting, as opposed to trying to signal qualities that we evolved to want to signal in non-anonymized ancestral environment?
Biggest thing that stood out to me watching this was that while the AI’s tactics seemed quite good, its game theory seemed quite poor—e.g. it wasn’t sufficiently vindictive if you betrayed it, which made it vulnerable to exploitation by a human aware of that fact.
I am doubtful about this. I am unsure whether Cicero will score higher if it is more vindictive, so I am hesitant to call its game theory poor. A good analogy is that I am hesitant to call AlphaGo’s endgame moves poor even if they look 100% poor, because I am not sure whether AlphaGo will win more games if it plays more human like endgame.
In the video, the human wins precisely because they exploit this fact about the AI.
I’m an author on the paper. This is an interesting topic that I think we approached in roughly the right way. For context, some of my teammates and I did earlier research on AI for poker, so that concern for exploitability certainly carried over to our work on Diplomacy.
The setting that the human plays in the video (one human vs 6 known Cicero agents) is not the setting that we intended the agent to play in and is not the setting that we evaluate the agent. That’s simply a demonstration to get a sense of how the bot plays. If you want to evaluate the bot’s exploitability and game theory, it should be done in the setting we intended for evaluation.
The setting we intended the bot to play in is games where all players are anonymous, and there is a large pool of possible players. That means players don’t necessarily know which player is a bot, or whether there is a bot in that specific game at all. In that case, it’s reasonable for the human players to assume all other players might engage in retaliatory behavior, so the agent gets the benefit of a tit-for-tat reputation without having to actually demonstrate it.
The assumption that players are anonymous is explicitly accounted for in the algorithm. It’s the reason why we assume there is a common knowledge distribution over our lambda parameters for piKL while in fact we actually play according to a single low lambda. If you were to change that assumption, perhaps by having all players know that a specific player is a bot at the start of the game, then you should change the common knowledge distribution over lambda parameters to be that the bot will play according to the lambda it actually intends to play. In that case the agent will behave differently. Specifically, it will play a much more mixed, less exploitable policy.
It sounds like Cicero competes to win against other players who are trying to satisfy other human goals ingrained by evolution. Does not seem very fair.
Do we know to what extent top-rated players actually try to win in this anonymized no-stakes setting, as opposed to trying to signal qualities that we evolved to want to signal in non-anonymized ancestral environment?
Why is your gain of function research deserving of NIH funding?