We cannot “prove” that something is physically impossible, only that it is impossible under some model of physics. Normally that distinction would be entirely irrelevant, but when dealing with a superintelligent AI, it’s quite likely to understand the physics better than we do. For all we know, it may turn out that Alcubierre drives are possible, and if so then the AI could definitely break out that way and would have an incentive to do so.
I agree that the AI is not really boxed here; it’s the “myopia” that makes the difference. But one of two things should generally be true:
The AI doesn’t want to get out of the box, in which case the box doesn’t need to be secure in the first place.
The AI cannot get out of the box, in which case the AI doesn’t need to be safe (but also won’t be very useful).
This case seems like the former, so long as hacking the human is easier than getting out of the box. But that means we don’t need to make the box perfect anyway.
In a sense, the AI “in the box” is not really boxed
I meant the “AI Box” scenario where it is printing results to a screen in the outside world. I do think BoMAI is truly boxed.
We cannot “prove” that something is physically impossible, only that it is impossible under some model of physics.
Right, that’s more or less what I mean to do. We can assign probabilities to statements like “it is physically impossible (under the true models of physics) for a human or a computer in isolation with an energy budget of x joules and y joules/second to transmit information in any way other than via a), b), or c) from above.” This seems extremely likely to me for reasonable values of x and y, so it’s still useful to have a “proof” even if it must be predicated on such a physical assumption.
We cannot “prove” that something is physically impossible, only that it is impossible under some model of physics. Normally that distinction would be entirely irrelevant, but when dealing with a superintelligent AI, it’s quite likely to understand the physics better than we do. For all we know, it may turn out that Alcubierre drives are possible, and if so then the AI could definitely break out that way and would have an incentive to do so.
I agree that the AI is not really boxed here; it’s the “myopia” that makes the difference. But one of two things should generally be true:
The AI doesn’t want to get out of the box, in which case the box doesn’t need to be secure in the first place.
The AI cannot get out of the box, in which case the AI doesn’t need to be safe (but also won’t be very useful).
This case seems like the former, so long as hacking the human is easier than getting out of the box. But that means we don’t need to make the box perfect anyway.
Whoops—when I said
I meant the “AI Box” scenario where it is printing results to a screen in the outside world. I do think BoMAI is truly boxed.
Right, that’s more or less what I mean to do. We can assign probabilities to statements like “it is physically impossible (under the true models of physics) for a human or a computer in isolation with an energy budget of x joules and y joules/second to transmit information in any way other than via a), b), or c) from above.” This seems extremely likely to me for reasonable values of x and y, so it’s still useful to have a “proof” even if it must be predicated on such a physical assumption.