Another idea for friendliness/containment: run the AI in a simulated world with no communication channels. Right from the outset, give it a bounded utility function that says it has to solve a certain math/physics problem, deposit the correct solution in a specified place and stop. If a solution can’t be found, stop after a specified number of cycles. Don’t talk to it at all. If you want another problem solved, start another AI from a clean slate. Would that work? Are AGI researchers allowed to relax a bit if they follow these precautions?
ETA: absent other suggestions, I’m going to call such devices “AI bombs”.
Are AGI researchers allowed to relax a bit if they follow these precautions?
If these precautions become necessary, end of the world will follow shortly (which is the only possible conclusion of “AGI research”, so I guess the researchers should rejoice at the work well done, and maybe “relax a bit”, as the world burns).
I don’t understand your argument. Are you saying this containment scheme won’t work because people won’t use it? If so, doesn’t the same objection apply to any FAI effort?
If my Vladimir-modelling heuristic is correct, he’s saying that you’re postulating a world where humanity has developed GAI but not FAI. Having your non-self-improving GAI solve stuff one math problem at a time for you is not going to save the world quickly enough to stop all the other research groups at a similar level of development from turning you and your boxed GAI into paperclips.
An AI in a simulated world isn’t prohibited from improving itself.
More to the point, I didn’t imagine I would save the world by writing one comment on LW :-) My idea of progress is solving small problems conclusively. Eliezer has spent a lot of effort convincing everybody here that AI containment is not just useless—it’s impossible. (Hence the AI-box experiments, the arguments against oracle AIs, etc.) If we update to thinking it’s possible after all, I think that would be enough progress for the day.
I don’t think it’s really an airtight proof—there’s a lot that a sufficiently powerful intelligence could learn about its questioners and their environment from a question; and when we can’t even prove there’s no such thing as a Langford Basilisk, we can’t establish an upper bound on the complexity of a safe answer. Essentially, researchers would be constrained by their own best judgement in the complexity of the questions and of the responses.
Of course, all that’s rather unlikely, especially as it (hopefully) wouldn’t be able to upgrade its hardware—but you’re right, software-only self-improvement would still be possible.
Yes, I agree. It would be safest to use such “AI bombs” for solving hard problems with short and machine-checkable solutions, like proving math theorems, designing algorithms or breaking crypto. There’s not much point for the AI to insert backdoors into the answer if it only cares about the verifier’s response after a trillion cycles, but the really paranoid programmer may also include a term in the AI’s utility function to favor shorter answers over longer ones.
What khafra said—also this sounds like propelling toy cars using thermonuclear explosions. How is this analogous to FAI? You want to let the FAI genie out of the bottle (although it will likely need a good sandbox for testing ground).
Yep, I caught that analogy as I was writing the original comment. Might be more like producing electricity from small, slow thermonuclear explosions, though :-)
Not small explosions. Spill one drop of this toxic stuff and it will eat away the universe, nowhere to hide! It’s not called “intelligence explosion” for nothing.
That’s right—I didn’t offer any arguments that a containment failure would not be catastrophic. But to be fair, FAI has exactly the same requirements for an error-free hardware and software platform, otherwise it destroys the universe just as efficiently.
Another idea for friendliness/containment: run the AI in a simulated world with no communication channels. Right from the outset, give it a bounded utility function that says it has to solve a certain math/physics problem, deposit the correct solution in a specified place and stop. If a solution can’t be found, stop after a specified number of cycles. Don’t talk to it at all. If you want another problem solved, start another AI from a clean slate. Would that work? Are AGI researchers allowed to relax a bit if they follow these precautions?
ETA: absent other suggestions, I’m going to call such devices “AI bombs”.
These ideas have already been investigated and documented:
Box: http://fragments.consc.net/djc/2010/04/the-singularity-a-philosophical-analysis.html
Stopping: http://alife.co.uk/essays/stopping_superintelligence/
If these precautions become necessary, end of the world will follow shortly (which is the only possible conclusion of “AGI research”, so I guess the researchers should rejoice at the work well done, and maybe “relax a bit”, as the world burns).
I don’t understand your argument. Are you saying this containment scheme won’t work because people won’t use it? If so, doesn’t the same objection apply to any FAI effort?
If my Vladimir-modelling heuristic is correct, he’s saying that you’re postulating a world where humanity has developed GAI but not FAI. Having your non-self-improving GAI solve stuff one math problem at a time for you is not going to save the world quickly enough to stop all the other research groups at a similar level of development from turning you and your boxed GAI into paperclips.
An AI in a simulated world isn’t prohibited from improving itself.
More to the point, I didn’t imagine I would save the world by writing one comment on LW :-) My idea of progress is solving small problems conclusively. Eliezer has spent a lot of effort convincing everybody here that AI containment is not just useless—it’s impossible. (Hence the AI-box experiments, the arguments against oracle AIs, etc.) If we update to thinking it’s possible after all, I think that would be enough progress for the day.
I don’t think it’s really an airtight proof—there’s a lot that a sufficiently powerful intelligence could learn about its questioners and their environment from a question; and when we can’t even prove there’s no such thing as a Langford Basilisk, we can’t establish an upper bound on the complexity of a safe answer. Essentially, researchers would be constrained by their own best judgement in the complexity of the questions and of the responses.
Of course, all that’s rather unlikely, especially as it (hopefully) wouldn’t be able to upgrade its hardware—but you’re right, software-only self-improvement would still be possible.
Yes, I agree. It would be safest to use such “AI bombs” for solving hard problems with short and machine-checkable solutions, like proving math theorems, designing algorithms or breaking crypto. There’s not much point for the AI to insert backdoors into the answer if it only cares about the verifier’s response after a trillion cycles, but the really paranoid programmer may also include a term in the AI’s utility function to favor shorter answers over longer ones.
What khafra said—also this sounds like propelling toy cars using thermonuclear explosions. How is this analogous to FAI? You want to let the FAI genie out of the bottle (although it will likely need a good sandbox for testing ground).
Yep, I caught that analogy as I was writing the original comment. Might be more like producing electricity from small, slow thermonuclear explosions, though :-)
Not small explosions. Spill one drop of this toxic stuff and it will eat away the universe, nowhere to hide! It’s not called “intelligence explosion” for nothing.
That’s right—I didn’t offer any arguments that a containment failure would not be catastrophic. But to be fair, FAI has exactly the same requirements for an error-free hardware and software platform, otherwise it destroys the universe just as efficiently.
Sure, prototypes of FAI will be similarly explosive.