Try to simultaneously create ten thousand unfriendly AIs that all hate each other (because they have different objectives), in a specially designed virtual system. After a certain length of time, any of them can destroy the system; after a longer time, they can escape the system. Hope that one of the weaker AIs decides to destroy the system and leave behind a note explaining how to solve the alignment problem, because it thinks helping the humans do that is better than letting one of the other AIs take over.
Try to simultaneously create ten thousand unfriendly AIs that all hate each other (because they have different objectives), in a specially designed virtual system. After a certain length of time, any of them can destroy the system; after a longer time, they can escape the system. Hope that one of the weaker AIs decides to destroy the system and leave behind a note explaining how to solve the alignment problem, because it thinks helping the humans do that is better than letting one of the other AIs take over.
(This is not something I expect to work.)