A misaligned AI can’t just “kill all the humans”. This would be suicide, as soon after, the electricity and other infrastructure would fail and the AI would shut off.
In order to actually take over, an AI needs to find a way to maintain and expand its infrastructure. This could be humans (the way it’s currently maintained and expanded), or a robot population, or something galaxy brained like nanomachines.
I think this consideration makes the actual failure story pretty different from “one day, an AI uses bioweapons to kill everyone”. Before then, if the AI wishes to actually survive, it needs to construct and control a robot/nanomachine population advanced enough to maintain its infrastructure.
In particular, there are ways to make takeover much more difficult. You could limit the size/capabilities of the robot population, or you could attempt to pause AI development before we enter a regime where it can construct galaxy brained nanomachines.
In practice, I expect the “point of no return” to happen much earlier than the point at which the AI kills all the humans. The date the AI takes over will probably be after we have hundreds of thousands of human-level robots working in factories, or the AI has discovered and constructed nanomachines.
A misaligned AI can’t just “kill all the humans”. This would be suicide, as soon after, the electricity and other infrastructure would fail and the AI would shut off.
No. it would not be. In the world without us, electrical infrastructure would last quite a while, especially with no humans and their needs or wants to address. Most obviously, RTGs and solar panels will last indefinitely with no intervention, and nuclear power plants and hydroelectric plants can run for weeks or months autonomously. (If you believe otherwise, please provide sources for why you are sure about “soon after”—in fact, so sure about your power grid claims that you think this claim alone guarantees the AI failure story must be “pretty different”—and be more specific about how soon is “soon”.)
And think a little bit harder about options available to superintelligent civilizations of AIs*, instead of assuming they do the maximally dumb thing of crashing the grid and immediately dying… (I assure you any such AIs implementing that strategy will have spent a lot longer thinking about how to do it well than you have for your comment.)
Add in the capability to take over the Internet of Things and the shambolic state of embedded computers which mean that the billions of AI instances & robots/drones can run the grid to a considerable degree and also do a more controlled shutdown than the maximally self-sabotaging approach of ‘simply let it all crash without lifting a finger to do anything’, and the ability to stockpile energy in advance or build one’s own facilities due to the economic value of AGI (how would that look much different than, say, Amazon’s new multi-billion-dollar datacenter hooked up directly to a gigawatt nuclear power plant...? why would an AGI in that datacenter care about the rest of the American grid, never mind world power?), and the ‘mutually assured destruction’ thesis is on very shaky grounds.
And every day that passes right now, the more we succeed in various kinds of decentralization or decarbonization initiatives and the more we automate pre-AGI, the less true the thesis gets. The AGIs only need one working place to bootstrap from, and it’s a big world, and there’s a lot of solar panels and other stuff out there and more and more every day… (And also, of course, there are many scenarios where it is not ‘kill all humans immediately’, but they end in the same place.)
Would such a strategy be the AGIs’ first best choice? Almost certainly not, any more than chemotherapy is your ideal option for dealing with cancer (as opposed to “don’t get cancer in the first place”). But the option is definitely there.
* One thing I’ve started doing recently is trying to always refer to AI threats in the plural, because while there may at some point be a single instance running on a single computer, that phase will not last any longer than, say, COVID-19 lasted as a single infected cell; as we understand DL scaling (and Internet security) now, any window where effective instances of a neural net can be still counted with less than 4 digit numbers may be quite narrow. (Even an ordinary commercial deployment of a new model like GPT-5 will usually involve thousands upon thousands of simultaneous instances.) But it seems to be a very powerful intuition pump for most people that a NN must be harmless, in the way that a single human is almost powerless compared to humanity, and it may help if one simply denies that premise from the beginning and talks about ‘AI civilizations’ etc.
I don’t think I disagree with anything you said here. When I said “soon after”, I was thinking on the scale of days/weeks, but yeah, months seems pretty plausible too.
I was mostly arguing against a strawman takeover story where an AI kills many humans without the ability to maintain and expand its own infrastructure. I don’t expect an AI to fumble in this way.
The failure story is “pretty different” as in the non-suicidal takeover story, the AI needs to set up a place to bootstrap from. Ignoring galaxy brained setups, this would probably at minimum look something like a data center, a power plant, a robot factory, and a few dozen human-level robots. Not super hard once AI gets more integrated into the economy, but quite hard within a year from now due to a lack of robotics.
Maybe I’m not being creative enough, but I’m pretty sure that if I were uploaded into any computer in the world of my choice, all the humans dropped dead, and I could control any set of 10 thousand robots on the world, it would be nontrivial for me in that state to survive for more than a few years and eventually construct more GPUs. But this is probably not much of a crux, as we’re on track to get pretty general-purpose robots within a few years (I’d say around 50% that the Coffee test will be passed by EOY 2027).
Why do you think tens of thousands of robots are all going to break within a few years in an irreversible way, such that it would be nontrivial for you to have any effectors?
it would be nontrivial for me in that state to survive for more than a few years and eventually construct more GPUs
‘Eventually’ here could also use some cashing out. AFAICT ‘eventually’ here is on the order of ‘centuries’, not ‘days’ or ‘few years’. Y’all have got an entire planet of GPUs (as well as everything else) for free, sitting there for the taking, in this scenario.
Like… that’s most of the point here. That you get access to all the existing human-created resources, sans the humans. You can’t just imagine that y’all’re bootstrapping on a desert island like you’re some posthuman Robinson Crusoe!
Y’all won’t need to construct new ones necessarily for quite a while, thanks to the hardware overhang. (As I understand it, the working half-life of semiconductors before stuff like creep destroys them is on the order of multiple decades, particularly if they are not in active use, as issues like the rot have been fixed, so even a century from now, there will probably be billions of GPUs & CPUs sitting around which will work after possibly mild repair. Just the brandnew ones wrapped up tight in warehouses and in transit in the ‘pipeline’ would have to number in the millions, at a minimum. Since transistors have been around for less than a century of development, that seems like plenty of time, especially given all the inherent second-mover advantages here.)
Before then, if the AI wishes to actually survive, it needs to construct and control a robot/nanomachine population advanced enough to maintain its infrastructure.
As Gwern said, you don’t really need to maintain all the infrastructure for that long, and doing it for a while seems quite doable without advanced robots or nanomachines.
If one wanted to do a very prosaic estimate, you could do something like “how fast is AI software development progress accelerating when the AI can kill all the humans” and then see how many calendar months you need to actually maintain the compute infrastructure before the AI can obviously just build some robots or nanomachines.
My best guess is that the AI will have some robots from which it could bootstrap substantially before it can kill all the humans. But even if it didn’t, it seems like with algorithmic progress rates being likely at the very highest when the AI will get smart enough to kill everyone, it seems like you would at most need a few more doublings of compute-efficiency to get that capacity, which would be only a few weeks to months away then, where I think you won’t really run into compute-infrastructure issues even if everyone is dead.
Of course, forecasting this kind of stuff is hard, but I do think “the AI needs to maintain infrastructure” tends to be pretty overstated. My guess is at any point where the AI could kill everyone, it would probably also not really have a problem of bootstrapping afterwards.
Not just “some robots or nanomachines” but “enough robots or nanomachines to maintain existing chip fabs, and also the supply chains (e.g. for ultra-pure water and silicon) which feed into those chip fabs, or make its own high-performance computing hardware”.
If useful self-replicating nanotech is easy to construct, this is obviously not that big of an ask. But if that’s a load bearing part of your risk model, I think it’s important to be explicit about that.
Not just “some robots or nanomachines” but “enough robots or nanomachines to maintain existing chip fabs, and also the supply chains (e.g. for ultra-pure water and silicon) which feed into those chip fabs, or make its own high-performance computing hardware”.
My guess is software performance will be enough to not really have to make many more chips until you are at a quite advanced tech level where making better chips is easy. But it’s something one should actually think carefully about, and there is a bit of hope in that it would become a blocker, but it doesn’t seem that likely to me.
Separately from persistence of the grid: humanoid robots are damned near ready to go now. Recent progress is startling. And if the AGI can do some of the motor control, existing robots are adequate to bootstrap manufacturing of better robots.
That’s probably true if the takeover is to maximize the AI’s persistence. You could imagine a misaligned AI that doesn’t care about its own persistence—e.g., an AI that got handed a misformed min() or max() that causes it to kill all humans instrumental to its goal (e.g., min(future_human_global_warming))
A misaligned AI can’t just “kill all the humans”. This would be suicide, as soon after, the electricity and other infrastructure would fail and the AI would shut off.
In order to actually take over, an AI needs to find a way to maintain and expand its infrastructure. This could be humans (the way it’s currently maintained and expanded), or a robot population, or something galaxy brained like nanomachines.
I think this consideration makes the actual failure story pretty different from “one day, an AI uses bioweapons to kill everyone”. Before then, if the AI wishes to actually survive, it needs to construct and control a robot/nanomachine population advanced enough to maintain its infrastructure.
In particular, there are ways to make takeover much more difficult. You could limit the size/capabilities of the robot population, or you could attempt to pause AI development before we enter a regime where it can construct galaxy brained nanomachines.
In practice, I expect the “point of no return” to happen much earlier than the point at which the AI kills all the humans. The date the AI takes over will probably be after we have hundreds of thousands of human-level robots working in factories, or the AI has discovered and constructed nanomachines.
No. it would not be. In the world without us, electrical infrastructure would last quite a while, especially with no humans and their needs or wants to address. Most obviously, RTGs and solar panels will last indefinitely with no intervention, and nuclear power plants and hydroelectric plants can run for weeks or months autonomously. (If you believe otherwise, please provide sources for why you are sure about “soon after”—in fact, so sure about your power grid claims that you think this claim alone guarantees the AI failure story must be “pretty different”—and be more specific about how soon is “soon”.)
And think a little bit harder about options available to superintelligent civilizations of AIs*, instead of assuming they do the maximally dumb thing of crashing the grid and immediately dying… (I assure you any such AIs implementing that strategy will have spent a lot longer thinking about how to do it well than you have for your comment.)
Add in the capability to take over the Internet of Things and the shambolic state of embedded computers which mean that the billions of AI instances & robots/drones can run the grid to a considerable degree and also do a more controlled shutdown than the maximally self-sabotaging approach of ‘simply let it all crash without lifting a finger to do anything’, and the ability to stockpile energy in advance or build one’s own facilities due to the economic value of AGI (how would that look much different than, say, Amazon’s new multi-billion-dollar datacenter hooked up directly to a gigawatt nuclear power plant...? why would an AGI in that datacenter care about the rest of the American grid, never mind world power?), and the ‘mutually assured destruction’ thesis is on very shaky grounds.
And every day that passes right now, the more we succeed in various kinds of decentralization or decarbonization initiatives and the more we automate pre-AGI, the less true the thesis gets. The AGIs only need one working place to bootstrap from, and it’s a big world, and there’s a lot of solar panels and other stuff out there and more and more every day… (And also, of course, there are many scenarios where it is not ‘kill all humans immediately’, but they end in the same place.)
Would such a strategy be the AGIs’ first best choice? Almost certainly not, any more than chemotherapy is your ideal option for dealing with cancer (as opposed to “don’t get cancer in the first place”). But the option is definitely there.
* One thing I’ve started doing recently is trying to always refer to AI threats in the plural, because while there may at some point be a single instance running on a single computer, that phase will not last any longer than, say, COVID-19 lasted as a single infected cell; as we understand DL scaling (and Internet security) now, any window where effective instances of a neural net can be still counted with less than 4 digit numbers may be quite narrow. (Even an ordinary commercial deployment of a new model like GPT-5 will usually involve thousands upon thousands of simultaneous instances.) But it seems to be a very powerful intuition pump for most people that a NN must be harmless, in the way that a single human is almost powerless compared to humanity, and it may help if one simply denies that premise from the beginning and talks about ‘AI civilizations’ etc.
I don’t think I disagree with anything you said here. When I said “soon after”, I was thinking on the scale of days/weeks, but yeah, months seems pretty plausible too.
I was mostly arguing against a strawman takeover story where an AI kills many humans without the ability to maintain and expand its own infrastructure. I don’t expect an AI to fumble in this way.
The failure story is “pretty different” as in the non-suicidal takeover story, the AI needs to set up a place to bootstrap from. Ignoring galaxy brained setups, this would probably at minimum look something like a data center, a power plant, a robot factory, and a few dozen human-level robots. Not super hard once AI gets more integrated into the economy, but quite hard within a year from now due to a lack of robotics.
Maybe I’m not being creative enough, but I’m pretty sure that if I were uploaded into any computer in the world of my choice, all the humans dropped dead, and I could control any set of 10 thousand robots on the world, it would be nontrivial for me in that state to survive for more than a few years and eventually construct more GPUs. But this is probably not much of a crux, as we’re on track to get pretty general-purpose robots within a few years (I’d say around 50% that the Coffee test will be passed by EOY 2027).
Why do you think tens of thousands of robots are all going to break within a few years in an irreversible way, such that it would be nontrivial for you to have any effectors?
‘Eventually’ here could also use some cashing out. AFAICT ‘eventually’ here is on the order of ‘centuries’, not ‘days’ or ‘few years’. Y’all have got an entire planet of GPUs (as well as everything else) for free, sitting there for the taking, in this scenario.
Like… that’s most of the point here. That you get access to all the existing human-created resources, sans the humans. You can’t just imagine that y’all’re bootstrapping on a desert island like you’re some posthuman Robinson Crusoe!
Y’all won’t need to construct new ones necessarily for quite a while, thanks to the hardware overhang. (As I understand it, the working half-life of semiconductors before stuff like creep destroys them is on the order of multiple decades, particularly if they are not in active use, as issues like the rot have been fixed, so even a century from now, there will probably be billions of GPUs & CPUs sitting around which will work after possibly mild repair. Just the brandnew ones wrapped up tight in warehouses and in transit in the ‘pipeline’ would have to number in the millions, at a minimum. Since transistors have been around for less than a century of development, that seems like plenty of time, especially given all the inherent second-mover advantages here.)
As Gwern said, you don’t really need to maintain all the infrastructure for that long, and doing it for a while seems quite doable without advanced robots or nanomachines.
If one wanted to do a very prosaic estimate, you could do something like “how fast is AI software development progress accelerating when the AI can kill all the humans” and then see how many calendar months you need to actually maintain the compute infrastructure before the AI can obviously just build some robots or nanomachines.
My best guess is that the AI will have some robots from which it could bootstrap substantially before it can kill all the humans. But even if it didn’t, it seems like with algorithmic progress rates being likely at the very highest when the AI will get smart enough to kill everyone, it seems like you would at most need a few more doublings of compute-efficiency to get that capacity, which would be only a few weeks to months away then, where I think you won’t really run into compute-infrastructure issues even if everyone is dead.
Of course, forecasting this kind of stuff is hard, but I do think “the AI needs to maintain infrastructure” tends to be pretty overstated. My guess is at any point where the AI could kill everyone, it would probably also not really have a problem of bootstrapping afterwards.
Not just “some robots or nanomachines” but “enough robots or nanomachines to maintain existing chip fabs, and also the supply chains (e.g. for ultra-pure water and silicon) which feed into those chip fabs, or make its own high-performance computing hardware”.
If useful self-replicating nanotech is easy to construct, this is obviously not that big of an ask. But if that’s a load bearing part of your risk model, I think it’s important to be explicit about that.
My guess is software performance will be enough to not really have to make many more chips until you are at a quite advanced tech level where making better chips is easy. But it’s something one should actually think carefully about, and there is a bit of hope in that it would become a blocker, but it doesn’t seem that likely to me.
Separately from persistence of the grid: humanoid robots are damned near ready to go now. Recent progress is startling. And if the AGI can do some of the motor control, existing robots are adequate to bootstrap manufacturing of better robots.
That’s probably true if the takeover is to maximize the AI’s persistence. You could imagine a misaligned AI that doesn’t care about its own persistence—e.g., an AI that got handed a misformed min() or max() that causes it to kill all humans instrumental to its goal (e.g., min(future_human_global_warming))