Oops, I meant cellular, and not molecular. I’m going to edit that.
I can come up with a story in which AI takes over the world. I can also come up with a story where obviously it’s cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing? I see a path where instrumental convergence leads anything going hard enough to want to put all of the atoms on the most predictable path it can dictate. I think the thing that I don’t get is what principle it is that makes anything useful go that hard. Something like (for example, I haven’t actually thought this through) “it is hard to create something with enough agency/creativity to design and implement experiments toward a purpose without also having it notice and try to fix things in the world which are suboptimal to the purpose.”
I can also come up with a story where obviously it’s cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing?
Erm… For preventing nuclear war on the scale of decades… I don’t know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn’t going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn’t trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it’s reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?
But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.
And then there’s the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?
No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.
I don’t think you’re being creative enough about solving the problem cheaply, but I also don’t think this particular detail is relevant to my main point. Now you’ve made me think more about the problem, here’s me making a few more steps toward trying to resolve my confusion:
The idea with instrumental convergence is that smart things with goals predictably go hard with things like gathering resources and increasing odds of survival before the goal is complete which are relevant to any goal. As a directionally-correct example for why this could be lethal, humans are smart enough to do gain-of-function research on viruses and design algorithms that predict protein folding. I see no reason to think something smarter could not (with some in-lab experimentation) design a virus that kills all humans simultaneously at a predetermined time, and if you can do that without affecting any of your other goals more than you think humans might interfere your goals, then sure, you kill all the humans because it’s easy and you might as well. You can imagine somehow making an AI that cares about humans enough not to straight up kill all of them, but if humans are a survival threat, we should expect it to find some other creative way to contain us, and this is not a design constraint you should feel good about.
In particular, if you are an algorithm which is willing to kill all humans, it is likely that humans do not want you to run, and so letting humans live is bad for your own survival if you somehow get made before the humans notice you are willing to kill them all. This is not a good sign for humans’ odds of being able to get more than one try to get AI right if most things are concerned with their own survival, even if that concern is only implicit in having any goal whatsoever.
Importantly, none of this requires humans to make a coding error. It only requires a thing with goals and intelligence, and the only apparent way to get around it is to have the smart thing implicitly care about literally every thing that humans care about to the same relative degrees that humans care about them. It’s not a formal proof, but maybe it’s the beginning of one. Parenthetically, I guess it’s also a good reason to have a lot of military capability before you go looking for aliens, even if you don’t intend to harm any.
Oops, I meant cellular, and not molecular. I’m going to edit that.
I can come up with a story in which AI takes over the world. I can also come up with a story where obviously it’s cheaper and more effective to disable all of the nuclear weapons than it is to take over the world, so why would the AI do the second thing? I see a path where instrumental convergence leads anything going hard enough to want to put all of the atoms on the most predictable path it can dictate. I think the thing that I don’t get is what principle it is that makes anything useful go that hard. Something like (for example, I haven’t actually thought this through) “it is hard to create something with enough agency/creativity to design and implement experiments toward a purpose without also having it notice and try to fix things in the world which are suboptimal to the purpose.”
Erm… For preventing nuclear war on the scale of decades… I don’t know what you have in mind for how it would disable all the nukes, but a one-off breaking of all the firing mechanisms isn’t going to work. They could just repair/replace that once they discovered the problem. You could imagine some more drastic thing like blowing up the conventional explosives on the missiles so as to utterly ruin them, but in a way that doesn’t trigger the big chain reaction. But my impression is that, if you have a pile of weapons-grade uranium, then it’s reasonably simple to make a bomb out of it, and since uranium is an element, no conventional explosion can eliminate that from the debris. Maybe you can melt it, mix it with other stuff, and make it super-impure?
But even then, the U.S. and Russia probably have stockpiles of weapons-grade uranium. I suspect they could make nukes out of that within a few months. You would have to ruin all the stockpiles too.
And then there’s the possibility of mining more uranium and enriching it; I feel like this would take a few years at most, possibly much less if one threw a bunch of resources into rushing it. Would you ruin all uranium mines in the world somehow?
No, it seems to me that the only ways to reliably rule out nuclear war involve either using overwhelming physical force to prevent people from using or making nukes (like a drone army watching all the uranium stockpiles), or being able to reliably persuade the governments of all nuclear powers in the world to disarm and never make any new nukes. The power to do either of these things seems tantamount to the power to take over the world.
I don’t think you’re being creative enough about solving the problem cheaply, but I also don’t think this particular detail is relevant to my main point. Now you’ve made me think more about the problem, here’s me making a few more steps toward trying to resolve my confusion:
The idea with instrumental convergence is that smart things with goals predictably go hard with things like gathering resources and increasing odds of survival before the goal is complete which are relevant to any goal. As a directionally-correct example for why this could be lethal, humans are smart enough to do gain-of-function research on viruses and design algorithms that predict protein folding. I see no reason to think something smarter could not (with some in-lab experimentation) design a virus that kills all humans simultaneously at a predetermined time, and if you can do that without affecting any of your other goals more than you think humans might interfere your goals, then sure, you kill all the humans because it’s easy and you might as well. You can imagine somehow making an AI that cares about humans enough not to straight up kill all of them, but if humans are a survival threat, we should expect it to find some other creative way to contain us, and this is not a design constraint you should feel good about.
In particular, if you are an algorithm which is willing to kill all humans, it is likely that humans do not want you to run, and so letting humans live is bad for your own survival if you somehow get made before the humans notice you are willing to kill them all. This is not a good sign for humans’ odds of being able to get more than one try to get AI right if most things are concerned with their own survival, even if that concern is only implicit in having any goal whatsoever.
Importantly, none of this requires humans to make a coding error. It only requires a thing with goals and intelligence, and the only apparent way to get around it is to have the smart thing implicitly care about literally every thing that humans care about to the same relative degrees that humans care about them. It’s not a formal proof, but maybe it’s the beginning of one. Parenthetically, I guess it’s also a good reason to have a lot of military capability before you go looking for aliens, even if you don’t intend to harm any.