Rohin Shah comments on Research Agenda in reverse: what would a solution look like?

Rohin Shah 1 Jul 2019 15:47 UTC
LW: 4 AF: 2
AF
There’s a difference between “creating an explicit preference learning system” and “having a generally capable system learn preferences”. I think the former is difficult (because of the Occam’s razor argument) but the latter is not.
Suppose I told you that we built a superintelligent AI system without thinking at all about grounded human preferences. Do you think that AI system doesn’t “know” what humans would want it to do, even if it doesn’t optimize for it? (See also this failed utopia story.)
- Stuart_Armstrong 1 Jul 2019 15:52 UTC
  LW: 5 AF: 2
  AF Parent
  
  Do you think that AI system doesn’t “know” what humans would want, even if it doesn’t optimize for it?
  
  I think the AI would not know that, because “what humans would want” is not defined. “What humans say they want”, “what, upon reflection, humans would agree they want...”, etc can be done, but “what humans want” is not a defined things about the world or about humans—without extra assumptions (which cannot be deduced from observation).