We have trained ML systems to play games, what if we trained one to play a simplified version of the “I’m an AI in human society” game?
Have a population of agents with preferences, the AI is given some poorly specified goal, it has the ability to expand its capabilities etc. You might expect to observe things like a “treacherous turn”.
If we could do that it would be quite the scary headline “Researchers simulate the future with AI and it kills us all”. Not proof, but perhaps viral and persuasive.
This would not be a conclusive test, but definitely a cool one and may spark a lot of research. Perhaps we could get started with something NLP based, opening up more and more knowledge access to the AI in the form of training data. Probably still not feasible as of 2022 in term of raw compute required.
The board game ‘Diplomacy’ comes to mind. I wonder if anyone’s ever tried to get AIs to play it?
Certainly there’ve been a lot of multi-agent prisoners dilemma tournaments. I think MIRI even managed to get agents to cooperate in one-shot prisoners dilemma games, as long as they could examine each other’s source code.
We have trained ML systems to play games, what if we trained one to play a simplified version of the “I’m an AI in human society” game?
Have a population of agents with preferences, the AI is given some poorly specified goal, it has the ability to expand its capabilities etc. You might expect to observe things like a “treacherous turn”.
If we could do that it would be quite the scary headline “Researchers simulate the future with AI and it kills us all”. Not proof, but perhaps viral and persuasive.
This would not be a conclusive test, but definitely a cool one and may spark a lot of research. Perhaps we could get started with something NLP based, opening up more and more knowledge access to the AI in the form of training data. Probably still not feasible as of 2022 in term of raw compute required.
The board game ‘Diplomacy’ comes to mind. I wonder if anyone’s ever tried to get AIs to play it?
Certainly there’ve been a lot of multi-agent prisoners dilemma tournaments. I think MIRI even managed to get agents to cooperate in one-shot prisoners dilemma games, as long as they could examine each other’s source code.