This list is focused on scenarios where FAI succeeds by creating an AI that explodes and takes over the world. What about scenarios where FAI succeeds by creating an AI that provably doesn’t take over the world? This isn’t a climactic ending (although it may be a big step toward one), but it’s still a success for FAI, since it averts a UFAI catastrophe.
(Is there a name for the strategy of making an oracle AI safe by making it not want to take over the world? Perhaps ‘Hermit AI’ or ‘Anchorite AI’, because it doesn’t want to leave its box?)
This scenario deserves more attention that it has been getting, because it doesn’t depend on solving all the problems of FAI in the right order. Unlike Nanny AI that takes over the world but only uses its powers for certain purposes, Anchorite AI might be a much easier problem than full-fledged FAI, so it might be developed earlier.
In the form of the OP:
Fantastic: FAI research proceeds much faster than AI research, so by the time we can make a superhuman AI, we already know how to make it Friendly (and we know what we really want that to mean).
Pretty good: Superhuman AI arrives before we learn how to make it Friendly, but we do learn how to make an Anchorite AI that definitely won’t take over the world. The first superhuman AIs use this architecture, and we use them to solve the harder problems of FAI before anyone sets off an exploding UFAI.
Sufficiently good: The problems of Friendliness aren’t solved in time, or the solutions don’t apply to practical architectures, or the creators of the first superhuman AIs don’t use them, so the AIs have only unreliable safeguards. They’re given cheap, attainable goals; the creators have tools to read the AIs’ minds to ensure they’re not trying anything naughty, and killswitches to stop them; they have an aversion to increasing their intelligence beyond a certain point, and to whatever other failure modes the creators anticipate; they’re given little or no network connectivity; they’re kept ignorant of facts more relevant to exploding than to their assigned tasks; they require special hardware, so it’s harder for them to explode; and they’re otherwise designed to be safer if not actually safe. Fortunately they don’t encounter any really dangerous failure modes before they’re replaced with descendants that really are safe.
This list is focused on scenarios where FAI succeeds by creating an AI that explodes and takes over the world. What about scenarios where FAI succeeds by creating an AI that provably doesn’t take over the world? This isn’t a climactic ending (although it may be a big step toward one), but it’s still a success for FAI, since it averts a UFAI catastrophe.
(Is there a name for the strategy of making an oracle AI safe by making it not want to take over the world? Perhaps ‘Hermit AI’ or ‘Anchorite AI’, because it doesn’t want to leave its box?)
This scenario deserves more attention that it has been getting, because it doesn’t depend on solving all the problems of FAI in the right order. Unlike Nanny AI that takes over the world but only uses its powers for certain purposes, Anchorite AI might be a much easier problem than full-fledged FAI, so it might be developed earlier.
In the form of the OP:
Fantastic: FAI research proceeds much faster than AI research, so by the time we can make a superhuman AI, we already know how to make it Friendly (and we know what we really want that to mean).
Pretty good: Superhuman AI arrives before we learn how to make it Friendly, but we do learn how to make an Anchorite AI that definitely won’t take over the world. The first superhuman AIs use this architecture, and we use them to solve the harder problems of FAI before anyone sets off an exploding UFAI.
Sufficiently good: The problems of Friendliness aren’t solved in time, or the solutions don’t apply to practical architectures, or the creators of the first superhuman AIs don’t use them, so the AIs have only unreliable safeguards. They’re given cheap, attainable goals; the creators have tools to read the AIs’ minds to ensure they’re not trying anything naughty, and killswitches to stop them; they have an aversion to increasing their intelligence beyond a certain point, and to whatever other failure modes the creators anticipate; they’re given little or no network connectivity; they’re kept ignorant of facts more relevant to exploding than to their assigned tasks; they require special hardware, so it’s harder for them to explode; and they’re otherwise designed to be safer if not actually safe. Fortunately they don’t encounter any really dangerous failure modes before they’re replaced with descendants that really are safe.
Thanks! I’ve added it to the post. I particularly like that you included the ‘sufficiently good’ scenario—I hadn’t directly thought about that before.