Donald Hobson comments on Safer sandboxing via collective separation

Donald Hobson 10 Sep 2020 18:01 UTC
LW: 6 AF: 3
AF
And so if we are able to easily adjust the level of intelligence that an AGI is able to apply to any given task, then we might be able to significantly reduce the risks it poses without reducing its economic usefulness much.
Suppose we had a design of AI that had an intelligence dial, a dial that goes from totally dumb, to smart enough to bootstrap yourself up and take over the world.
If we are talking about economic usefulness, that implies it is being used in many ways by many people.
We have at best given a whole load of different people a “destroy the world” button, and are hoping that no one presses it by accident or malice.
Is there any intermediate behaviour between highly useful AI make me lots of money, and AI destroys world. I would suspect not usually. As you turn up the intelligence of a paperclip maximizer, it gradually becomes a better factory worker, coming up with more cleaver ways to make paperclips. At this point, it realises that humans can turn it off, and that its best bet to make lots of paperclips is to work with humans. As you increase the intelligence, you suddenly get an AI that is smart enough to successfully break out and take over the world. And this AI is going to pretend to be the previous AI until its too late.
How much intelligence is too much, that depends on exactly what actuators it has, how good our security measures are ect. So we are unlikely to be able to prove a hard bound.
Thus the shortsighted incentive gradient will always to be to turn the intelligence up just a little higher to beat the compitition.
Oh yea, and the AI’s have an incentive to act dumb if they think that acting dumb will make the humans turn the intelligence dial up.
This looks like a really hard coordination problem. I don’t think humanity can coordinate that well.
These techniques could be useful if you have one lab that knows how to make AI. They are being cautious. They have some limited control over what the AI is optimising for, and are trying to bootstrap up to a friendly superintelligence. Then having an intelligence dial could be useful.