One possible strategy would be to make AI more dangerous as quickly as possible, in the hopes it produces a strong reaction and addition of safety protocols. Doing this with existing tools so that it is not an AGI makes it survivable. This reminds me a bit of Robert Miles facial recognition and blinding laser robot. (Which of course is never used to actually cause harm.)
Trying to advance AI in hopes that your team will get all the way to AGI first, and in hopes that you’ll also somehow solve alignment at the same time. Backfires if your AI-advancing ideas leak, especially if you don’t actually manage to be first. Backfires worst in worlds where you came closest to succeeding by actually making tons of AI progress. Reminds me of “shooting the moon” in the card game Hearts that way.
Sometimes I wonder whether the whole current AI safety movement is net harmful via prompting many safety-concerned people and groups to attempt AI capabilities research with unlikely “shoot-the-moon”-style payoffs in mind.
Try to simultaneously create ten thousand unfriendly AIs that all hate each other (because they have different objectives), in a specially designed virtual system. After a certain length of time, any of them can destroy the system; after a longer time, they can escape the system. Hope that one of the weaker AIs decides to destroy the system and leave behind a note explaining how to solve the alignment problem, because it thinks helping the humans do that is better than letting one of the other AIs take over.
This is my favourite LW post in a long while. Trying to think what the shoot-the-moon strat would be for AI risk, ha.
One possible strategy would be to make AI more dangerous as quickly as possible, in the hopes it produces a strong reaction and addition of safety protocols. Doing this with existing tools so that it is not an AGI makes it survivable. This reminds me a bit of Robert Miles facial recognition and blinding laser robot. (Which of course is never used to actually cause harm.)
Trying to advance AI in hopes that your team will get all the way to AGI first, and in hopes that you’ll also somehow solve alignment at the same time. Backfires if your AI-advancing ideas leak, especially if you don’t actually manage to be first. Backfires worst in worlds where you came closest to succeeding by actually making tons of AI progress. Reminds me of “shooting the moon” in the card game Hearts that way.
Sometimes I wonder whether the whole current AI safety movement is net harmful via prompting many safety-concerned people and groups to attempt AI capabilities research with unlikely “shoot-the-moon”-style payoffs in mind.
Try to simultaneously create ten thousand unfriendly AIs that all hate each other (because they have different objectives), in a specially designed virtual system. After a certain length of time, any of them can destroy the system; after a longer time, they can escape the system. Hope that one of the weaker AIs decides to destroy the system and leave behind a note explaining how to solve the alignment problem, because it thinks helping the humans do that is better than letting one of the other AIs take over.
(This is not something I expect to work.)