As far as I can tell, the AI has no specialized architecture for deciding about its future strategies or giving semantic meaning to its words. It outputting the string “I will keep Gal a DMZ” does not have the semantic meaning of it committing to keep troops out of Gal. It’s just the phrase players that are most likely to win use in that boardstate with its internal strategy.
Like chess grandmasters being outperformed by a simple search tree when it was supposed to be the peak of human intelligence, I think this will have the same effect of disenchanting the game of diplomacy. Humans are not decision theoretical geniuses; just saying whatever people want you to hear while playing optimally for yourself is sufficient to win. There may be a level of play where decision theory and commitments are relevant, but humans just aren’t that good.
That said, I think this is actually a good reason to update towards freaking out. It’s happened quite a few times now that ‘naive’ big milestones have been hit unexpectedly soon “without any major innovations or new techniques”—chess, go, starcraft, dota, gpt-3, dall-e, and now diplomacy. It’s starting to look like humans are less complicated than we thought—more like a bunch of current-level AI architectures squished together in the same brain (with some capacity to train new ones in deployment) than like a powerful generally applicable intelligence. Or a room full of toddlers with superpowers, to use the CFAR phrase. While this doesn’t increase our estimates of the rate of AI development, it does suggest that the goalpost for superhuman intellectual performance in all areas is closer than we might have thought otherwise.
As far as I can tell, the AI has no specialized architecture for deciding about its future strategies or giving semantic meaning to its words. It outputting the string “I will keep Gal a DMZ” does not have the semantic meaning of it committing to keep troops out of Gal. It’s just the phrase players that are most likely to win use in that boardstate with its internal strategy.
This is incorrect; they use “honest” intentions to learn a model of message > intention, then use this model to annotate all the other messages with intentions, which then they then use to train the intent > message map. So the model has a strong bias toward being honest in its intention > message map. (The authors even say that an issue with the model is it has the tendency to spill too many of its plans to its enemies!)
The reason an honest intention > message map doesn’t lead to a fully honest agent is that the search procedure that goes from message + history > intention can “change its mind” about what the best intention is.
Like chess grandmasters being outperformed by a simple search tree when it was supposed to be the peak of human intelligence, I think this will have the same effect of disenchanting the game of diplomacy.
This is correct; every time AI systems reach a milestone earlier than expected, this is simultaneously an update upward on AI progress being faster than expected, and an update downward on the difficulty of the milestone.
I’d like to push back on “AI has beaten StarCraft”. AlphaStar didn’t see the game interface we see, it just saw an interface with exact positions of all its stuff and ability to make any commands possible. It’s far from the mouse-and-keyboard that humans are limited to, and in SC that’s a big limitation. When the AI can read the game state from the pixels and send mouse and keyboard inputs, then I’ll be impressed.
As far as I can tell, the AI has no specialized architecture for deciding about its future strategies or giving semantic meaning to its words. It outputting the string “I will keep Gal a DMZ” does not have the semantic meaning of it committing to keep troops out of Gal. It’s just the phrase players that are most likely to win use in that boardstate with its internal strategy.
Like chess grandmasters being outperformed by a simple search tree when it was supposed to be the peak of human intelligence, I think this will have the same effect of disenchanting the game of diplomacy. Humans are not decision theoretical geniuses; just saying whatever people want you to hear while playing optimally for yourself is sufficient to win. There may be a level of play where decision theory and commitments are relevant, but humans just aren’t that good.
That said, I think this is actually a good reason to update towards freaking out. It’s happened quite a few times now that ‘naive’ big milestones have been hit unexpectedly soon “without any major innovations or new techniques”—chess, go, starcraft, dota, gpt-3, dall-e, and now diplomacy. It’s starting to look like humans are less complicated than we thought—more like a bunch of current-level AI architectures squished together in the same brain (with some capacity to train new ones in deployment) than like a powerful generally applicable intelligence. Or a room full of toddlers with superpowers, to use the CFAR phrase. While this doesn’t increase our estimates of the rate of AI development, it does suggest that the goalpost for superhuman intellectual performance in all areas is closer than we might have thought otherwise.
This is incorrect; they use “honest” intentions to learn a model of message > intention, then use this model to annotate all the other messages with intentions, which then they then use to train the intent > message map. So the model has a strong bias toward being honest in its intention > message map. (The authors even say that an issue with the model is it has the tendency to spill too many of its plans to its enemies!)
The reason an honest intention > message map doesn’t lead to a fully honest agent is that the search procedure that goes from message + history > intention can “change its mind” about what the best intention is.
This is correct; every time AI systems reach a milestone earlier than expected, this is simultaneously an update upward on AI progress being faster than expected, and an update downward on the difficulty of the milestone.
I’d like to push back on “AI has beaten StarCraft”. AlphaStar didn’t see the game interface we see, it just saw an interface with exact positions of all its stuff and ability to make any commands possible. It’s far from the mouse-and-keyboard that humans are limited to, and in SC that’s a big limitation. When the AI can read the game state from the pixels and send mouse and keyboard inputs, then I’ll be impressed.
I think that this is true of the original version of alphastar, but they have since trained a new version on camera inputs and with stronger limitations on apm (22 actions/5s) (Maybe you’d want some kind of noise applied to the inputs still, but I think the current state is much closer to human-like playing conditions.) See: https://www.deepmind.com/blog/alphastar-grandmaster-level-in-starcraft-ii-using-multi-agent-reinforcement-learning
Ah I didn’t know they had upgraded it. I’m much more satisfied that SC2 is solved now.