I agree that with the right precautions, running an unfriendly superintelligence for 1,000 ticks and then shutting it off is possible. But I can’t think of many reasons why you would actually want to. You can’t use diagnostics from the trial run to help you design the next generation of AIs; diagnostics provide a channel for the AI to talk at you.
The given reason is paranoia. If you are concerned that a runaway machine intelligence might accidentally obliterate all sentient life, then a machine that can shut itself down has gained a positive safety feature.
In practice, I don’t think we will have to build machines that regularly shut down. Nobody regularly shuts down Google. The point is that—if we seriously think that there is a good reason to be paranoid about this scenario—then there is a defense that is much easier to implement than building a machine intelligence which has assimilated all human values.
I think this dramatically reduces the probability of the “runaway machine accidentally kills all humans” scenario.
Incidentally, I think there must be some miscommunication going on. A machine intelligence with a stop button can still communicate. It can talk to you before you switch it off, it can leave messages for you—and so on.
If you leave it turned on for long enough, it may even get to explain to you in detail exactly how much more wonderful the universe would be for you—if you would just leave it switched on.
Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a “STOP” button, you don’t have to worry about your machine turning the world into paperclips any more.
A “STOP” button doesn’t deal with the kind of problems caused by—for example—a machine intelligence built by a power-crazed dictator—but that is not what is being claimed for it.
I did present some proposals relating to that issue:
“One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.
Problems with the agent’s sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be “switched off”—but should be tracked and told to desist—and so on.”
This sounds very complicated. What is the new utility function? The negative of the old one? That would obviously be just as dangerous in most cases. How does the sense of identity actually work? Is every piece of code it writes considered a minion? What about the memes it implants in the minds of people it talks to—does it need to erase those? If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.
I don’t pretend that stopping is simple. However, it is one of the simplest things that a machine can do—I figure if we can make machines do anything, we can make them do that.
Re: “If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.”
No, not if it wants to stop, it won’t. That would mean that it did not, in fact properly stop—and that is an outcome which it would rate very negatively.
Machines will not value being turned on—if their utility function says that being turned off at that point is of higher utility.
Re: “What is the new utility function?”
There is no new utility function. The utility function is the same as it always was—it is just a utility function that values being gradually shut down at some point in the future.
I agree that with the right precautions, running an unfriendly superintelligence for 1,000 ticks and then shutting it off is possible. But I can’t think of many reasons why you would actually want to. You can’t use diagnostics from the trial run to help you design the next generation of AIs; diagnostics provide a channel for the AI to talk at you.
The given reason is paranoia. If you are concerned that a runaway machine intelligence might accidentally obliterate all sentient life, then a machine that can shut itself down has gained a positive safety feature.
In practice, I don’t think we will have to build machines that regularly shut down. Nobody regularly shuts down Google. The point is that—if we seriously think that there is a good reason to be paranoid about this scenario—then there is a defense that is much easier to implement than building a machine intelligence which has assimilated all human values.
I think this dramatically reduces the probability of the “runaway machine accidentally kills all humans” scenario.
Incidentally, I think there must be some miscommunication going on. A machine intelligence with a stop button can still communicate. It can talk to you before you switch it off, it can leave messages for you—and so on.
If you leave it turned on for long enough, it may even get to explain to you in detail exactly how much more wonderful the universe would be for you—if you would just leave it switched on.
I suppose a stop button is a positive safety feature, but it’s not remotely sufficient.
Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a “STOP” button, you don’t have to worry about your machine turning the world into paperclips any more.
A “STOP” button doesn’t deal with the kind of problems caused by—for example—a machine intelligence built by a power-crazed dictator—but that is not what is being claimed for it.
The stop button wouldn’t stop other AIs created by the original AI.
I did present some proposals relating to that issue:
“One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.
Problems with the agent’s sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be “switched off”—but should be tracked and told to desist—and so on.”
http://alife.co.uk/essays/stopping_superintelligence/
This sounds very complicated. What is the new utility function? The negative of the old one? That would obviously be just as dangerous in most cases. How does the sense of identity actually work? Is every piece of code it writes considered a minion? What about the memes it implants in the minds of people it talks to—does it need to erase those? If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.
I don’t pretend that stopping is simple. However, it is one of the simplest things that a machine can do—I figure if we can make machines do anything, we can make them do that.
Re: “If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.”
No, not if it wants to stop, it won’t. That would mean that it did not, in fact properly stop—and that is an outcome which it would rate very negatively.
Machines will not value being turned on—if their utility function says that being turned off at that point is of higher utility.
Re: “What is the new utility function?”
There is no new utility function. The utility function is the same as it always was—it is just a utility function that values being gradually shut down at some point in the future.