evhub comments on Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)

evhub 22 Nov 2022 23:31 UTC
14 points
6
Biggest thing that stood out to me watching this was that while the AI’s tactics seemed quite good, its game theory seemed quite poor—e.g. it wasn’t sufficiently vindictive if you betrayed it, which made it vulnerable to exploitation by a human aware of that fact.
- sanxiyn 23 Nov 2022 7:06 UTC
  1 point
  −2
  Parent
  I am doubtful about this. I am unsure whether Cicero will score higher if it is more vindictive, so I am hesitant to call its game theory poor. A good analogy is that I am hesitant to call AlphaGo’s endgame moves poor even if they look 100% poor, because I am not sure whether AlphaGo will win more games if it plays more human like endgame.
  - evhub 23 Nov 2022 8:29 UTC
    4 points
    2
    Parent
    In the video, the human wins precisely because they exploit this fact about the AI.
    - Noam Brown 23 Nov 2022 13:18 UTC
      48 points
      12
      Parent
      I’m an author on the paper. This is an interesting topic that I think we approached in roughly the right way. For context, some of my teammates and I did earlier research on AI for poker, so that concern for exploitability certainly carried over to our work on Diplomacy.
      The setting that the human plays in the video (one human vs 6 known Cicero agents) is not the setting that we intended the agent to play in and is not the setting that we evaluate the agent. That’s simply a demonstration to get a sense of how the bot plays. If you want to evaluate the bot’s exploitability and game theory, it should be done in the setting we intended for evaluation.
      The setting we intended the bot to play in is games where all players are anonymous, and there is a large pool of possible players. That means players don’t necessarily know which player is a bot, or whether there is a bot in that specific game at all. In that case, it’s reasonable for the human players to assume all other players might engage in retaliatory behavior, so the agent gets the benefit of a tit-for-tat reputation without having to actually demonstrate it.
      The assumption that players are anonymous is explicitly accounted for in the algorithm. It’s the reason why we assume there is a common knowledge distribution over our lambda parameters for piKL while in fact we actually play according to a single low lambda. If you were to change that assumption, perhaps by having all players know that a specific player is a bot at the start of the game, then you should change the common knowledge distribution over lambda parameters to be that the bot will play according to the lambda it actually intends to play. In that case the agent will behave differently. Specifically, it will play a much more mixed, less exploitable policy.
      What links here?
      On the Diplomacy AI by Zvi (28 Nov 2022 13:20 UTC; 127 points)
      - dolery 23 Nov 2022 21:19 UTC
        −3 points
        −7
        Parent
        It sounds like Cicero competes to win against other players who are trying to satisfy other human goals ingrained by evolution. Does not seem very fair.
        Do we know to what extent top-rated players actually try to win in this anonymized no-stakes setting, as opposed to trying to signal qualities that we evolved to want to signal in non-anonymized ancestral environment?
      - Tomás B. 23 Nov 2022 14:42 UTC
        −14 points
        2
        Parent
        Why is your gain of function research deserving of NIH funding?