There’s a difference between “creating an explicit preference learning system” and “having a generally capable system learn preferences”. I think the former is difficult (because of the Occam’s razor argument) but the latter is not.
Suppose I told you that we built a superintelligent AI system without thinking at all about grounded human preferences. Do you think that AI system doesn’t “know” what humans would want it to do, even if it doesn’t optimize for it? (See also this failed utopia story.)
Do you think that AI system doesn’t “know” what humans would want, even if it doesn’t optimize for it?
I think the AI would not know that, because “what humans would want” is not defined. “What humans say they want”, “what, upon reflection, humans would agree they want...”, etc can be done, but “what humans want” is not a defined things about the world or about humans—without extra assumptions (which cannot be deduced from observation).
There’s a difference between “creating an explicit preference learning system” and “having a generally capable system learn preferences”. I think the former is difficult (because of the Occam’s razor argument) but the latter is not.
Suppose I told you that we built a superintelligent AI system without thinking at all about grounded human preferences. Do you think that AI system doesn’t “know” what humans would want it to do, even if it doesn’t optimize for it? (See also this failed utopia story.)
I think the AI would not know that, because “what humans would want” is not defined. “What humans say they want”, “what, upon reflection, humans would agree they want...”, etc can be done, but “what humans want” is not a defined things about the world or about humans—without extra assumptions (which cannot be deduced from observation).