Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a “STOP” button, you don’t have to worry about your machine turning the world into paperclips any more.
A “STOP” button doesn’t deal with the kind of problems caused by—for example—a machine intelligence built by a power-crazed dictator—but that is not what is being claimed for it.
I did present some proposals relating to that issue:
“One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.
Problems with the agent’s sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be “switched off”—but should be tracked and told to desist—and so on.”
This sounds very complicated. What is the new utility function? The negative of the old one? That would obviously be just as dangerous in most cases. How does the sense of identity actually work? Is every piece of code it writes considered a minion? What about the memes it implants in the minds of people it talks to—does it need to erase those? If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.
I don’t pretend that stopping is simple. However, it is one of the simplest things that a machine can do—I figure if we can make machines do anything, we can make them do that.
Re: “If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.”
No, not if it wants to stop, it won’t. That would mean that it did not, in fact properly stop—and that is an outcome which it would rate very negatively.
Machines will not value being turned on—if their utility function says that being turned off at that point is of higher utility.
Re: “What is the new utility function?”
There is no new utility function. The utility function is the same as it always was—it is just a utility function that values being gradually shut down at some point in the future.
Sufficient for what? The idea of a machine intelligence that can STOP is to deal with concerns about a runaway machine intelligence engaging in extended destructive expansion against the wishes of its creators. If you can correctly engineer a “STOP” button, you don’t have to worry about your machine turning the world into paperclips any more.
A “STOP” button doesn’t deal with the kind of problems caused by—for example—a machine intelligence built by a power-crazed dictator—but that is not what is being claimed for it.
The stop button wouldn’t stop other AIs created by the original AI.
I did present some proposals relating to that issue:
“One thing that might help is to put the agent into a quiescent state before being switched off. In the quiescent state, utility depends on not taking any of its previous utility-producing actions. This helps to motivate the machine to ensure subcontractors and minions can be told to cease and desist. If the agent is doing nothing when it is switched off, hopefully, it will continue to do nothing.
Problems with the agent’s sense of identity can be partly addressed by making sure that it has a good sense of identity. If it makes minions, it should count them as somatic tissue, and ensure they are switched off as well. Subcontractors should not be “switched off”—but should be tracked and told to desist—and so on.”
http://alife.co.uk/essays/stopping_superintelligence/
This sounds very complicated. What is the new utility function? The negative of the old one? That would obviously be just as dangerous in most cases. How does the sense of identity actually work? Is every piece of code it writes considered a minion? What about the memes it implants in the minds of people it talks to—does it need to erase those? If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.
I don’t pretend that stopping is simple. However, it is one of the simplest things that a machine can do—I figure if we can make machines do anything, we can make them do that.
Re: “If the AI knows it will undergo this transformation in the future, it would erase its own knowledge of the minions it has created, and do other things to ensure that it will be powerless when its utility function changes.”
No, not if it wants to stop, it won’t. That would mean that it did not, in fact properly stop—and that is an outcome which it would rate very negatively.
Machines will not value being turned on—if their utility function says that being turned off at that point is of higher utility.
Re: “What is the new utility function?”
There is no new utility function. The utility function is the same as it always was—it is just a utility function that values being gradually shut down at some point in the future.