Stuart_Armstrong comments on Research Agenda v0.9: Synthesising a human’s preferences into a utility function

Stuart_Armstrong 22 Jul 2019 10:00 UTC
LW: 2 AF: 1
AF
That bit was on the value of approximating the ideal. Having a smart AI and an version of $U_{H}$ , even an approximate one, can lead to much better outcomes than the default—at least, that’s the argument of that section.

(PS: I edited that first sentence to remove the double “for example”).
- Rohin Shah 22 Jul 2019 17:11 UTC
  LW: 2 AF: 1
  AF Parent
  Oh, I see, so the argument is that conditional on the idealized synthesis algorithm being a good definition of human preferences, the AI can approximate the synthesis algorithm, and whatever utility function it comes up with and optimizes should not have any human-identifiable problems. That makes sense. Followup questions:
  - How do you tell the AI system to optimize for “what the idealized synthesis algorithm would do”?
  - How can we be confident that the idealized synthesis algorithm actually captures what we care about?
  - Stuart_Armstrong 25 Jul 2019 9:09 UTC
    LW: 4 AF: 2
    AF Parent
    
    How do you tell the AI system to optimize for “what the idealized synthesis algorithm would do”?
    
    Here’s a grounding (see 1.2), here’s a definition of what we want, here are valid ways of approximating it :-) Basically at that point, it’s kind of like approximating full Bayesian updating for an exceedingly complex system.
    
    How can we be confident that the idealized synthesis algorithm actually captures what we care about?
    
    Much of the rest of the research agenda is arguing that this is the case. See especially section 2.8.
    
    But for both these points, more research is still needed.