You can freeze it and take a look at what it’s thinking at some point, perhaps?
If you look at it it can give you a text based message.
A) You haven’t told it that. B) You’re just as likely to look where it didn’t put this message.
Basically, to be let out, it could overwrite itself with a provably friendly AI and a proof of its friendliness.
If we could verify the proof, I’d take it.
If the ASI has nothing better to do while it’s boxed, it will pursue low-probability escape scenarios ferociously. One of those is to completely saturate its source code with brain-hacking basilisks in case any human tries to peer inside.
It would have to do that blind, without a clear model of our minds in place. We’d likely notice failed attempts and just kill it.
You can freeze it and take a look at what it’s thinking at some point, perhaps?
If you look at it it can give you a text based message.
A) You haven’t told it that. B) You’re just as likely to look where it didn’t put this message.
Basically, to be let out, it could overwrite itself with a provably friendly AI and a proof of its friendliness.
If we could verify the proof, I’d take it.
If the ASI has nothing better to do while it’s boxed, it will pursue low-probability escape scenarios ferociously. One of those is to completely saturate its source code with brain-hacking basilisks in case any human tries to peer inside.
It would have to do that blind, without a clear model of our minds in place. We’d likely notice failed attempts and just kill it.