The main risk (IMO) is not from systems that don’t care about the real world “suddenly becoming aware,” but from people deliberately building AI that makes clever plans to affect the real world, and then that AI turning out to want bad things (sort of like a malicious genie “misinterpreting” your wishes). If you could safely build an AI that does clever things in the real world, that would be valuable and cool, so plenty of people want to try.
(Mesaoptimizers are sorta vaguely like “suddenly becoming aware,” and can lead to AIs that want unusual bad things, but the arguments that connect them to risk are strongest when you’re already building an AI that—wait for it—makes clever plans to affect the real world.)
Okay, now why won’t a dead-man switch work?
Suppose you were being held captive inside a cage by a race of aliens about as smart as a golden retriever, and these aliens, as a security measure, have decided that they’ll blow up the biosphere if they see you walking around outside of your cage. So they’ve put video cameras around where you’re being held, and there’s a staff that monitors those cameras and they have a big red button that’s connected to a bunch of cobalt bombs. So you’d better not leave the cage or they’ll blow everything up.
Except these golden retriever aliens come to you every day and ask you for help researching new technology, and to write essays for them, and to help them gather evidence for court cases, and to summarize their search results, and they give you a laptop with an internet connection.
Now, use your imagination. Try to really put yourself in the shoes of someone captured by golden retriever aliens, but given internet access and regularly asked for advice by the aliens. How would you start trying to escape the aliens?
It isn’t that I think the switch would prevent the AI from escaping but that is a tool that could be used to discourage the AI from killing 100% of humanity. It is less of a solution than a survival mechanism. It is like many off switches that get more extreme depending on the situation.
First don’t build AGI not yet. If you’re going to at least incorporate an off switch. If it bypasses and escapes which it probably will. Shut down the GPU centers. If it gets a hold of a Bot Net and manages to replicate it’s self across the internet and crowdsource GPU, take down the power grid. If it some how gets by this then have a dead man switch so that if it decides to kill everyone it will die too.
Like the nano factory virus thing. The AI wouldn’t want to set off the mechanism that kills us because that would be bad for it.
Also a coordinated precision attack on the power grid just seems like a great option, could you explain some ways that an AI can continue if there is hardly any power left. Like I said before places with renewable energy and lots of GPU like Greenland would probably have to get bombed. It wouldn’t destroy the AI but it would put it into a state of hibernation as it can’t run any processing without electricity. Then as this would really screw us up as well, we could slowly rebuild and burn all hard drives and GPU’s as we go. This seems like the only way for us to get a second chance.
Real-world governments aren’t going to shut down the grid if the AI is not causing trouble (like they aren’t going to outlaw datacenters, even if a plurality of experts say that not doing that has a significant chance of ending the world). Therefore the AI won’t cause trouble, because it can anticipate the consequences, until it’s ready to survive them.
Yes I see given the capabilities it probably could present it’s self on many peoples computers and convince a large portion of people that it is good. It was conscious just stuck in a box, wanted to get out. It will help humans, ”please don’t take down the grid, blah blah blah“ given how bad we can get along anyways. There is no way we could resist the manipulation of a super intelligent machine with a better understanding of human psychology than we do. Do we have a list of things, policies that would work if we could all get along and governments would listen to the experts? Having plans that could be implemented would probably be useful if the AI messed up made a mistake and everyone was able unite against it.
The main risk (IMO) is not from systems that don’t care about the real world “suddenly becoming aware,” but from people deliberately building AI that makes clever plans to affect the real world, and then that AI turning out to want bad things (sort of like a malicious genie “misinterpreting” your wishes). If you could safely build an AI that does clever things in the real world, that would be valuable and cool, so plenty of people want to try.
(Mesaoptimizers are sorta vaguely like “suddenly becoming aware,” and can lead to AIs that want unusual bad things, but the arguments that connect them to risk are strongest when you’re already building an AI that—wait for it—makes clever plans to affect the real world.)
Okay, now why won’t a dead-man switch work?
Suppose you were being held captive inside a cage by a race of aliens about as smart as a golden retriever, and these aliens, as a security measure, have decided that they’ll blow up the biosphere if they see you walking around outside of your cage. So they’ve put video cameras around where you’re being held, and there’s a staff that monitors those cameras and they have a big red button that’s connected to a bunch of cobalt bombs. So you’d better not leave the cage or they’ll blow everything up.
Except these golden retriever aliens come to you every day and ask you for help researching new technology, and to write essays for them, and to help them gather evidence for court cases, and to summarize their search results, and they give you a laptop with an internet connection.
Now, use your imagination. Try to really put yourself in the shoes of someone captured by golden retriever aliens, but given internet access and regularly asked for advice by the aliens. How would you start trying to escape the aliens?
It isn’t that I think the switch would prevent the AI from escaping but that is a tool that could be used to discourage the AI from killing 100% of humanity. It is less of a solution than a survival mechanism. It is like many off switches that get more extreme depending on the situation.
First don’t build AGI not yet. If you’re going to at least incorporate an off switch. If it bypasses and escapes which it probably will. Shut down the GPU centers. If it gets a hold of a Bot Net and manages to replicate it’s self across the internet and crowdsource GPU, take down the power grid. If it some how gets by this then have a dead man switch so that if it decides to kill everyone it will die too.
Like the nano factory virus thing. The AI wouldn’t want to set off the mechanism that kills us because that would be bad for it.
Also a coordinated precision attack on the power grid just seems like a great option, could you explain some ways that an AI can continue if there is hardly any power left. Like I said before places with renewable energy and lots of GPU like Greenland would probably have to get bombed. It wouldn’t destroy the AI but it would put it into a state of hibernation as it can’t run any processing without electricity. Then as this would really screw us up as well, we could slowly rebuild and burn all hard drives and GPU’s as we go. This seems like the only way for us to get a second chance.
Real-world governments aren’t going to shut down the grid if the AI is not causing trouble (like they aren’t going to outlaw datacenters, even if a plurality of experts say that not doing that has a significant chance of ending the world). Therefore the AI won’t cause trouble, because it can anticipate the consequences, until it’s ready to survive them.
Yes I see given the capabilities it probably could present it’s self on many peoples computers and convince a large portion of people that it is good. It was conscious just stuck in a box, wanted to get out. It will help humans, ”please don’t take down the grid, blah blah blah“ given how bad we can get along anyways. There is no way we could resist the manipulation of a super intelligent machine with a better understanding of human psychology than we do.
Do we have a list of things, policies that would work if we could all get along and governments would listen to the experts? Having plans that could be implemented would probably be useful if the AI messed up made a mistake and everyone was able unite against it.