I believe that even through a text-only terminal, a superintelligence could do anything to a human. Persuade the human to let it out, inflict extreme pleasure or suffering with a word. However, can’t you just… limit the output of the superintelligence? Just make it so that the human can say anything, but the AI can only respond from a short list of responses, like “Yes,” “No,” “Maybe,” “IDK,” “The first option,” “The second option,” or a similar system. I… don’t see how that can still present a risk. What about you?
[Question] AI box question
I’m not sure that AI boxing is a live debate anymore. People are lining up to give full web access to current limited-but-unknown-capabilities implementations, and there’s not much reason to believe there will be any attempt at constraining the use or reach of more advanced versions.
In theory, possibly, but it’s not clear how to save the world given such restricted access. See e.g. https://www.lesswrong.com/posts/NojipcrFFMzNx6Grc/sudo-s-shortform?commentId=onKfTrunn2Q2Gc4Pw
In practice no, because you can’t deal with a superintelligence safely. E.g.
You can’t build a computer system that’s robust to auto-exfiltration. I mean, maybe you can, but you’re taking on a whole bunch more cost, and also hoping you didn’t screw up.
You can’t develop this tech without other people stealing it and running it unsafely.
You can’t develop this tech safely at all, because in order to develop it you have to do a lot more than just get a few outputs, you have to, like, debug your code and stuff.
And so forth. Mainly and so forth.