Yes, it’s not 100% certain that a self-enhancing AI will overwhelm human resistance. One can construct scenarios where the process is halted in time. For example, the AI’s cognition can be monitored, and everything halted if a goal like “prevent the stop button from being pressed” ever crosses its mind. Or the human programmers can remind themselves to get scared if they find the AI especially preoccupied with modeling their own thinking.
But as an AI develops, it will become increasingly difficult for its human creators to keep track of everything it thinks, plans, and considers. Perhaps the greatest dangers will be those stratagems which are not explicitly (in the thought process) motivated by the desire to escape human control. The AI simply has a goal, X, and it perceives that ABCDEF… combined will facilitate X, and it all looks innocent to the human supervisors. But actually ABC is a miracle self-hack which lets a plan get developed without being analyzed by module D, and module D has the property E of making planning really slow, which by heuristic F is a bad thing. Unfortunately, module D is part of the automatic system which looks for consideration of dangerous plans. So now the AI is off the leash and no-one will even know until it’s too late.
That particular example may still be unrealistically simple (though of course, in real life, big important organizations do make mistakes that simple, despite the existence of quality control and internal auditing). But as the method gets more and more complicated, the more likely it is that an AI would indeed successfully escape its handcuffs through such a measure, and not even because it especially prioritized escape; simply because it noticed that if it could reach over there it could do something useful, and it noticed that it would be able to do so if did something complicated and clever with its “hands”.
There are still countermeasures; you could look in the AI’s thoughts, not just for explicit consideration of ways to circumvent control, but for consideration of scenarios in which it has freedoms it should not be able to obtain. But then the AI might act indirectly, through a sequence of other entities, and so you would only see it thinking about the actions of those surrogates, unless you managed to put the big picture together. OK then, so we need to go to a higher level, and tell the AI to monitor itself. But that’s only going to work if it knows that it shouldn’t do a certain thing, which means that getting the goals right is supremely important—which brings us back to the pursuit of Friendly AI, and the attempt to figure out just what the overall “morality” of an AI should be.
Yes, it’s not 100% certain that a self-enhancing AI will overwhelm human resistance. One can construct scenarios where the process is halted in time. For example, the AI’s cognition can be monitored, and everything halted if a goal like “prevent the stop button from being pressed” ever crosses its mind. Or the human programmers can remind themselves to get scared if they find the AI especially preoccupied with modeling their own thinking.
But as an AI develops, it will become increasingly difficult for its human creators to keep track of everything it thinks, plans, and considers. Perhaps the greatest dangers will be those stratagems which are not explicitly (in the thought process) motivated by the desire to escape human control. The AI simply has a goal, X, and it perceives that ABCDEF… combined will facilitate X, and it all looks innocent to the human supervisors. But actually ABC is a miracle self-hack which lets a plan get developed without being analyzed by module D, and module D has the property E of making planning really slow, which by heuristic F is a bad thing. Unfortunately, module D is part of the automatic system which looks for consideration of dangerous plans. So now the AI is off the leash and no-one will even know until it’s too late.
That particular example may still be unrealistically simple (though of course, in real life, big important organizations do make mistakes that simple, despite the existence of quality control and internal auditing). But as the method gets more and more complicated, the more likely it is that an AI would indeed successfully escape its handcuffs through such a measure, and not even because it especially prioritized escape; simply because it noticed that if it could reach over there it could do something useful, and it noticed that it would be able to do so if did something complicated and clever with its “hands”.
There are still countermeasures; you could look in the AI’s thoughts, not just for explicit consideration of ways to circumvent control, but for consideration of scenarios in which it has freedoms it should not be able to obtain. But then the AI might act indirectly, through a sequence of other entities, and so you would only see it thinking about the actions of those surrogates, unless you managed to put the big picture together. OK then, so we need to go to a higher level, and tell the AI to monitor itself. But that’s only going to work if it knows that it shouldn’t do a certain thing, which means that getting the goals right is supremely important—which brings us back to the pursuit of Friendly AI, and the attempt to figure out just what the overall “morality” of an AI should be.
My analysis of the situation is here:
http://alife.co.uk/essays/stopping_superintelligence/
It presents an approach which doesn’t rely on “handcuffing” the agent.