Suppose we get to specify, by magic, a list of techniques that AGIs won’t be able to use to take over the world. How long does that list need to be before it makes a significant dent in the overall probability of xrisk?
I used to think of “AGI designs self-replicating nanotech” mainly as an illustration of a broad class of takeover scenarios. But upon further thought, nanotech feels like a pretty central element of many takeover scenarios—you actually do need physical actuators to do many things, and the robots we might build in the foreseeable future are nowhere near what’s necessary for maintaining a civilisation. So how much time might it buy us if AGIs couldn’t use nanotech at all?
Well, not very much if human minds are still an attack vector—the point where we’d have effectively lost is when we can no longer make our own decisions. Okay, so rule out brainwashing/hyper-persuasion too. What else is there? The three most salient: military power, political/cultural power, economic power.
Is this all just a hypothetical exercise? I’m not sure. Designing self-replicating nanotech capable of replacing all other human tech seems really hard; it’s pretty plausible to me that the world is crazy in a bunch of other ways by the time we reach that capability. And so if we can block off a couple of the easier routes to power, that might actually buy useful time.
Firstly, I think it kind of depends. What exactly does blocking the AI from designing nanotech mean? Is the AI allowed to use genetic engineering? Is it allowed to use selective breeding? Elephants genetically engineered to be really good at instruction following?
I mean I think macroscopic self replicating robotics is probably possible, and the AGI can probably bootstrap that from current robotics fairly quickly.
You rule out any hyper-persuasion. How much regular persuasion is the AI allowed to do. After all, if you are buying something online, (from a small seller) them seeing the money arrive persuades them to send the product? Is it allowed to select which human to focus on superhumanly. There are a few people on r/singularity, such that the moment the AI goes, “I’m an AGI”, the humans will be like ” all praise the machine god, I will do anything you ask”. A few people have already persuaded themselves that AI’s are inherently superior to humans by themselves.
You can make the list short. If you make the individual items broad.
ie
the AI is magically banned from doing anything at all.
I agree. Self-replicating nanotech seems to be likely a much harder problem than for language models to get good enough actors to get political, cultural, and economic power.
To the extent that an AGI can make political and economic decisions that are of higher quality than human decisions, there’s also a lot of pressure for humans to delegate those decisions to AGI. Organizations that delegate those decisions to AGI will outcompete those who don’t.
Another general technique: attacks on computing systems. (Both takeover / subversion (dropping an email going ‘um this is a problem’) and destruction (destroy the US power infrastructure using Russian-language programs)).
These don’t tend to be sufficient in and of themselves, but are “classic” stepping-stones to e.g. buy time for an AI while it ramps up.
The last three options you mentioned are all things that happen over relatively slow timescales, if your goal is to completely destroy humanity. The single exception to this is nuclear war, but if you’re correct, then we can reduce the problem to non-proliferation, which is at least in theory solvable.
Suppose we get to specify, by magic, a list of techniques that AGIs won’t be able to use to take over the world. How long does that list need to be before it makes a significant dent in the overall probability of xrisk?
I used to think of “AGI designs self-replicating nanotech” mainly as an illustration of a broad class of takeover scenarios. But upon further thought, nanotech feels like a pretty central element of many takeover scenarios—you actually do need physical actuators to do many things, and the robots we might build in the foreseeable future are nowhere near what’s necessary for maintaining a civilisation. So how much time might it buy us if AGIs couldn’t use nanotech at all?
Well, not very much if human minds are still an attack vector—the point where we’d have effectively lost is when we can no longer make our own decisions. Okay, so rule out brainwashing/hyper-persuasion too. What else is there? The three most salient: military power, political/cultural power, economic power.
Is this all just a hypothetical exercise? I’m not sure. Designing self-replicating nanotech capable of replacing all other human tech seems really hard; it’s pretty plausible to me that the world is crazy in a bunch of other ways by the time we reach that capability. And so if we can block off a couple of the easier routes to power, that might actually buy useful time.
Firstly, I think it kind of depends. What exactly does blocking the AI from designing nanotech mean? Is the AI allowed to use genetic engineering? Is it allowed to use selective breeding? Elephants genetically engineered to be really good at instruction following?
I mean I think macroscopic self replicating robotics is probably possible, and the AGI can probably bootstrap that from current robotics fairly quickly.
You rule out any hyper-persuasion. How much regular persuasion is the AI allowed to do. After all, if you are buying something online, (from a small seller) them seeing the money arrive persuades them to send the product? Is it allowed to select which human to focus on superhumanly. There are a few people on r/singularity, such that the moment the AI goes, “I’m an AGI”, the humans will be like ” all praise the machine god, I will do anything you ask”. A few people have already persuaded themselves that AI’s are inherently superior to humans by themselves.
You can make the list short. If you make the individual items broad.
ie
the AI is magically banned from doing anything at all.
I agree. Self-replicating nanotech seems to be likely a much harder problem than for language models to get good enough actors to get political, cultural, and economic power.
To the extent that an AGI can make political and economic decisions that are of higher quality than human decisions, there’s also a lot of pressure for humans to delegate those decisions to AGI. Organizations that delegate those decisions to AGI will outcompete those who don’t.
Another general technique: attacks on computing systems. (Both takeover / subversion (dropping an email going ‘um this is a problem’) and destruction (destroy the US power infrastructure using Russian-language programs)).
These don’t tend to be sufficient in and of themselves, but are “classic” stepping-stones to e.g. buy time for an AI while it ramps up.
The last three options you mentioned are all things that happen over relatively slow timescales, if your goal is to completely destroy humanity. The single exception to this is nuclear war, but if you’re correct, then we can reduce the problem to non-proliferation, which is at least in theory solvable.