As far as I can tell, the AI has no specialized architecture for deciding about its future strategies or giving semantic meaning to its words. It outputting the string “I will keep Gal a DMZ” does not have the semantic meaning of it committing to keep troops out of Gal. It’s just the phrase players that are most likely to win use in that boardstate with its internal strategy.
This is incorrect; they use “honest” intentions to learn a model of message > intention, then use this model to annotate all the other messages with intentions, which then they then use to train the intent > message map. So the model has a strong bias toward being honest in its intention > message map. (The authors even say that an issue with the model is it has the tendency to spill too many of its plans to its enemies!)
The reason an honest intention > message map doesn’t lead to a fully honest agent is that the search procedure that goes from message + history > intention can “change its mind” about what the best intention is.
Like chess grandmasters being outperformed by a simple search tree when it was supposed to be the peak of human intelligence, I think this will have the same effect of disenchanting the game of diplomacy.
This is correct; every time AI systems reach a milestone earlier than expected, this is simultaneously an update upward on AI progress being faster than expected, and an update downward on the difficulty of the milestone.
This is incorrect; they use “honest” intentions to learn a model of message > intention, then use this model to annotate all the other messages with intentions, which then they then use to train the intent > message map. So the model has a strong bias toward being honest in its intention > message map. (The authors even say that an issue with the model is it has the tendency to spill too many of its plans to its enemies!)
The reason an honest intention > message map doesn’t lead to a fully honest agent is that the search procedure that goes from message + history > intention can “change its mind” about what the best intention is.
This is correct; every time AI systems reach a milestone earlier than expected, this is simultaneously an update upward on AI progress being faster than expected, and an update downward on the difficulty of the milestone.