Suppose you make a super-intelligent AI and run it on a computer. The computer has NO conventional means of output (no connections to other computers, no screen, etc).
It is a hypothetical situation of unreasonably high security that tries to probe for an upper bound on the level of containment required to secure an AI.
I would think that the sorts of hypotheticals that would be most useful to entertain would be ones that explore the safety of the most secure systems anyone would have an actual incentive to implement.
Could you contain a Strong AI running on a computer with no output systems, sealed in a lead box at the bottom of the ocean? Presumably yes, but in that case, you might as well skip the step of actually making the AI.
You say “presumably yes”. The whole point of this discussion is to listen to everyone who will say “obviously no”; their arguments would automatically apply to all weaker boxing techniques.
All the suggestions so far that might allow an AI without conventional outputs to get out would be overcome by the lead box+ocean defenses. I don’t think that containing a strong AI is likely to be that difficult a problem. The really difficult problem is containing a strong AI while getting anything useful out of it.
If we are not inventive enough to find a menace not obviously shielded by lead+ocean, more complex tasks like, say, actually designing FOOM-able AI is beyond us anyway…
I meant that designing a working FOOM-able AI (or non-FOOMable AGI, for that matter) is vastly harder than finding a few hypothetical hihg-risk scenarios.
I.e. walking the walk is harder than talking the talk.
If the ASI has nothing better to do while it’s boxed, it will pursue low-probability escape scenarios ferociously. One of those is to completely saturate its source code with brain-hacking basilisks in case any human tries to peer inside.
Why would you do that though?
If an isolated AI can easily escape in any circumstance, it really doesn’t make sense to train gatekeepers.
Yes, but why run it on a computer at all? It doesn’t seem likely to do you any good that way.
It is a hypothetical situation of unreasonably high security that tries to probe for an upper bound on the level of containment required to secure an AI.
I would think that the sorts of hypotheticals that would be most useful to entertain would be ones that explore the safety of the most secure systems anyone would have an actual incentive to implement.
Could you contain a Strong AI running on a computer with no output systems, sealed in a lead box at the bottom of the ocean? Presumably yes, but in that case, you might as well skip the step of actually making the AI.
You say “presumably yes”. The whole point of this discussion is to listen to everyone who will say “obviously no”; their arguments would automatically apply to all weaker boxing techniques.
All the suggestions so far that might allow an AI without conventional outputs to get out would be overcome by the lead box+ocean defenses. I don’t think that containing a strong AI is likely to be that difficult a problem. The really difficult problem is containing a strong AI while getting anything useful out of it.
If we are not inventive enough to find a menace not obviously shielded by lead+ocean, more complex tasks like, say, actually designing FOOM-able AI is beyond us anyway…
I… don’t believe that.
I think that making a FOOM-able AI is much easier than making an AI that can break out of a (considerably stronger) lead box in solar orbit.
And you are completely right.
I meant that designing a working FOOM-able AI (or non-FOOMable AGI, for that matter) is vastly harder than finding a few hypothetical hihg-risk scenarios.
I.e. walking the walk is harder than talking the talk.
You can freeze it and take a look at what it’s thinking at some point, perhaps?
If you look at it it can give you a text based message.
A) You haven’t told it that. B) You’re just as likely to look where it didn’t put this message.
Basically, to be let out, it could overwrite itself with a provably friendly AI and a proof of its friendliness.
If we could verify the proof, I’d take it.
If the ASI has nothing better to do while it’s boxed, it will pursue low-probability escape scenarios ferociously. One of those is to completely saturate its source code with brain-hacking basilisks in case any human tries to peer inside.
It would have to do that blind, without a clear model of our minds in place. We’d likely notice failed attempts and just kill it.