The problem: Winning Diplomacy games against humans.
The action space: You control an empire in early twentieth-century Europe. You can write orders to your armies and fleets to move and cities to build more armies and fleets. Crucially, you can also engage in text messaging with the other empires. When a timer runs out, the orders are all simultaneously carried out.
The reward: 0 for losing, 1 for winning by yourself, some sort of fractional reward for victories split with other players.
I’m no AI scientist, so I might be totally wrong, but I suspect that you could fine-tune a language model like GPT-2 on a corpus of online diplomacy game chat logs, and then use that model somehow as a component of the RL agent that you train to play the game.
Not sure this would be good or bad for the world, just thought it would be interesting. I sure would love to see it, haha.
The problem: Winning Diplomacy games against humans.
The action space: You control an empire in early twentieth-century Europe. You can write orders to your armies and fleets to move and cities to build more armies and fleets. Crucially, you can also engage in text messaging with the other empires. When a timer runs out, the orders are all simultaneously carried out.
The reward: 0 for losing, 1 for winning by yourself, some sort of fractional reward for victories split with other players.
I’m no AI scientist, so I might be totally wrong, but I suspect that you could fine-tune a language model like GPT-2 on a corpus of online diplomacy game chat logs, and then use that model somehow as a component of the RL agent that you train to play the game.
Not sure this would be good or bad for the world, just thought it would be interesting. I sure would love to see it, haha.
You might be interested to learn about some recently announced work on training agents with reinforcement learning to play “no-press” Diplomacy:
https://arxiv.org/abs/1909.02128
https://arxiv.org/abs/2006.04635