Those software companies test their products for crashes and loops. There is a word for testing an AI of unknown Friendliness and that word is “suicide”.
The argument—to the extent that I can make sense of it—is that you can’t restrain an super-intelligent machine—since it will simply use its superior brainpower to escape from the constraints.
We successfully restrain intelligent agents all the time—in prisons. The prisoners may be smarter than the guards, and they often outnumber them—and yet still the restraints are usually successuful.
Some of the key observations to my mind are:
You can often restrain one agent with many stupider agents;
The restraining agents do not need to be humans—they can be other machines;
You can often restrain one agent with a totally dumb cage;
Complex systems can often be tested in small pieces (unit testing);
Large systems can often be tested on a smaller scale before deployment;
Systems can often be tested in virtual environments, reducing the cost of failure.
Discarding the standard testing-based methodology would be very silly, IMO.
Indeed, it would sabotage your project to the point that it would almost inevitably be beaten—and there is very little point in aiming to lose.
Are you familiar with the AI-Box experiment? We can restrain human-intelligence level agents in prisons, most of the time. But the question to ask is: how effective was the first prison? Because that’s the equivalent case.
None of the safety measures you propose are safe enough. You’re underestimating the power of a recursively self-improving AI by a factor I can’t begin to estimate—which is kind of the point.
A much stronger argument than all-powerful AIs suddenly escaping (which is still not without merit) is that AI will have an incentive to behave as we expect it to behave, until at some point we no longer control it. It’ll try its best to pass all tests.
My point is that “ai box experiment” communicates orders of magnitude less evidence about the danger of escaping AIs than people like to imply, and there are lots of stronger and simpler self-contained arguments such as the one I gave. (The overall danger is much greater than even that, because these are specific plots with an obvious villain, while reality is more subtle.)
If we have powerful intelligence that needs testing, then we can have powerful guards too.
The AI-Box experiment has human guards. Consequently, it has very low relevance to the actual problem. Programmers don’t build their test harnesses out of human beings.
Safety is usually an economic trade off. You can usually have an lot of it—if you are prepared to pay for it.
Those software companies test their products for crashes and loops. There is a word for testing an AI of unknown Friendliness and that word is “suicide”.
That just seems to be another confusion to me :-(
The argument—to the extent that I can make sense of it—is that you can’t restrain an super-intelligent machine—since it will simply use its superior brainpower to escape from the constraints.
We successfully restrain intelligent agents all the time—in prisons. The prisoners may be smarter than the guards, and they often outnumber them—and yet still the restraints are usually successuful.
Some of the key observations to my mind are:
You can often restrain one agent with many stupider agents;
The restraining agents do not need to be humans—they can be other machines;
You can often restrain one agent with a totally dumb cage;
Complex systems can often be tested in small pieces (unit testing);
Large systems can often be tested on a smaller scale before deployment;
Systems can often be tested in virtual environments, reducing the cost of failure.
Discarding the standard testing-based methodology would be very silly, IMO.
Indeed, it would sabotage your project to the point that it would almost inevitably be beaten—and there is very little point in aiming to lose.
Are you familiar with the AI-Box experiment? We can restrain human-intelligence level agents in prisons, most of the time. But the question to ask is: how effective was the first prison? Because that’s the equivalent case.
None of the safety measures you propose are safe enough. You’re underestimating the power of a recursively self-improving AI by a factor I can’t begin to estimate—which is kind of the point.
A much stronger argument than all-powerful AIs suddenly escaping (which is still not without merit) is that AI will have an incentive to behave as we expect it to behave, until at some point we no longer control it. It’ll try its best to pass all tests.
I suppose I was mentally classifying that kind of behavior as an escape; you’re right that it should be called out as a separate point of failure.
My point is that “ai box experiment” communicates orders of magnitude less evidence about the danger of escaping AIs than people like to imply, and there are lots of stronger and simpler self-contained arguments such as the one I gave. (The overall danger is much greater than even that, because these are specific plots with an obvious villain, while reality is more subtle.)
Ahhh, I see what you’re getting at. Agreed.
For that matter, calling it an “experiment” is quite misleading.
So: while it believes it is under evaluation it does its very best to behave itself?
Can we wire that belief in as a prior with p=1.0?
It won’t be the first prison—or anything like it.
If we have powerful intelligence that needs testing, then we can have powerful guards too.
The AI-Box experiment has human guards. Consequently, it has very low relevance to the actual problem. Programmers don’t build their test harnesses out of human beings.
Safety is usually an economic trade off. You can usually have an lot of it—if you are prepared to pay for it.