Rohin Shah comments on Superintelligent Introspection: A Counter-argument to the Orthogonality Thesis

Rohin Shah 30 Aug 2021 8:42 UTC
11 points
This makes sense, and I agree that there’s no ghost in the machine in this story.
It seems though that this story is relying quite heavily on the assumption that the “AI is designed to do goal inference on other programs”, whereas your post seems to be making claims about all possible AIs. (The orthogonality thesis only claims that there exists an AI system with intelligence level X and goal Y for all X and Y, so its negation is that there is some X and Y such that every AI system either does not have intelligence level X or does not have goal Y.)
Why can’t there be a superintelligent AI system that doesn’t modify its goal?
(I agree it will be able to tell the difference between a thing and its representation. You seem to be assuming that the “goal” is the thing humans want and the “representation” is the thing in its source code. But it also seems possible that the “goal” is the thing in its source code and the “representation” is the thing humans want.)
(I also agree that it will know that humans meant for the “goal” to be things humans want. That doesn’t mean that the “goal” is actually things humans want, from the AI’s perspective.)