Here we have an AI that, through deception, can persuade humans to ally with it against other humans, and sets them up for eventual betrayal. This is something that Eliezer anticipated with his AI box experiment.
The AI-box experiment and this result are barely related at all—the only connection between them is that they both involve deception in some manner. “There will exist future AI systems which sometimes behave deceptively” can hardly be considered a meaningful advance prediction.
Meta just unveiled an AI that can play Diplomacy at a human level.
Here we have an AI that, through deception, can persuade humans to ally with it against other humans, and sets them up for eventual betrayal. This is something that Eliezer anticipated with his AI box experiment.
The AI-box experiment and this result are barely related at all—the only connection between them is that they both involve deception in some manner. “There will exist future AI systems which sometimes behave deceptively” can hardly be considered a meaningful advance prediction.