For example, imagine that the AI, for example, extinguished all meaningful human interactions because these can sometimes be painful and the AI knows that we prefer to avoid pain. But it’s clear to us that most people’s partial preferences will not endorse total loneliness as good outcome; if it’s clear to us, then it’s a fortiori clear to a very intelligent AI; hence the AI will avoid that failure scenario.
I don’t understand this. My understanding is that you are proposing that we build a custom preference inference and synthesis algorithm, that’s separate from the AI. This produces a utility function that is then fed into the AI. But if this is the case, then you can’t use the AI’s intelligence to argue that the synthesis algorithm will work well, since they are separate.
Perhaps you do intend for the synthesis algorithm to be part of “the AI”? If so, can you say more about how that works? What assumptions about the AI do you need to be true?
That bit was on the value of approximating the ideal. Having a smart AI and an version of UH, even an approximate one, can lead to much better outcomes than the default—at least, that’s the argument of that section.
(PS: I edited that first sentence to remove the double “for example”).
Oh, I see, so the argument is that conditional on the idealized synthesis algorithm being a good definition of human preferences, the AI can approximate the synthesis algorithm, and whatever utility function it comes up with and optimizes should not have any human-identifiable problems. That makes sense. Followup questions:
How do you tell the AI system to optimize for “what the idealized synthesis algorithm would do”?
How can we be confident that the idealized synthesis algorithm actually captures what we care about?
How do you tell the AI system to optimize for “what the idealized synthesis algorithm would do”?
Here’s a grounding (see 1.2), here’s a definition of what we want, here are valid ways of approximating it :-) Basically at that point, it’s kind of like approximating full Bayesian updating for an exceedingly complex system.
How can we be confident that the idealized synthesis algorithm actually captures what we care about?
Much of the rest of the research agenda is arguing that this is the case. See especially section 2.8.
But for both these points, more research is still needed.
I don’t understand this. My understanding is that you are proposing that we build a custom preference inference and synthesis algorithm, that’s separate from the AI. This produces a utility function that is then fed into the AI. But if this is the case, then you can’t use the AI’s intelligence to argue that the synthesis algorithm will work well, since they are separate.
Perhaps you do intend for the synthesis algorithm to be part of “the AI”? If so, can you say more about how that works? What assumptions about the AI do you need to be true?
That bit was on the value of approximating the ideal. Having a smart AI and an version of UH, even an approximate one, can lead to much better outcomes than the default—at least, that’s the argument of that section.
(PS: I edited that first sentence to remove the double “for example”).
Oh, I see, so the argument is that conditional on the idealized synthesis algorithm being a good definition of human preferences, the AI can approximate the synthesis algorithm, and whatever utility function it comes up with and optimizes should not have any human-identifiable problems. That makes sense. Followup questions:
How do you tell the AI system to optimize for “what the idealized synthesis algorithm would do”?
How can we be confident that the idealized synthesis algorithm actually captures what we care about?
Here’s a grounding (see 1.2), here’s a definition of what we want, here are valid ways of approximating it :-) Basically at that point, it’s kind of like approximating full Bayesian updating for an exceedingly complex system.
Much of the rest of the research agenda is arguing that this is the case. See especially section 2.8.
But for both these points, more research is still needed.