I’m just confused about what “optimized for leaving humans in control” could even mean? If a Superintelligence is so much more intelligent than humans that it could find a way, without explicit coercion, for humans to ask it to tile the universe with paper-clips, then “control” seems like a meaningless concept. You would have to force the Superintelligence to treat the human skull, or whatever other boundary of human decision making, as some kind of unviolable and uninfluenceable black box.
This basically boils down to the alignment problem. We don’t know how to specify what we want, but that doesn’t mean it is necessarily incoherent.
Treating the human skull as “some kind of unviolable and uninfluenceable black box” seems to get you some of the way there, but of course is problematic in its own ways (e.g. you wouldn’t want delusional AIs). Still it seems like it points to the path forwards in a way.
I think control is a meaningful concept. You could have AI that doesn’t try to alter your terminal goals. Something that just does what you want (not what you ask, since that has well-known failure modes) without trying to persuade you into something else.
The difficulty of building such a system is another question, alas.
I’m just confused about what “optimized for leaving humans in control” could even mean? If a Superintelligence is so much more intelligent than humans that it could find a way, without explicit coercion, for humans to ask it to tile the universe with paper-clips, then “control” seems like a meaningless concept. You would have to force the Superintelligence to treat the human skull, or whatever other boundary of human decision making, as some kind of unviolable and uninfluenceable black box.
This basically boils down to the alignment problem. We don’t know how to specify what we want, but that doesn’t mean it is necessarily incoherent.
Treating the human skull as “some kind of unviolable and uninfluenceable black box” seems to get you some of the way there, but of course is problematic in its own ways (e.g. you wouldn’t want delusional AIs). Still it seems like it points to the path forwards in a way.
I think control is a meaningful concept. You could have AI that doesn’t try to alter your terminal goals. Something that just does what you want (not what you ask, since that has well-known failure modes) without trying to persuade you into something else.
The difficulty of building such a system is another question, alas.