I agree that us humans have a lot of information about human values that we want to be able to put into the AI in its architecture and in the design of its training process. But I don’t see why multi-objective RL is particularly interesting. Do you think we won’t need any other ways of giving the AI information about human values? If so, why? If not, and you just think it’s interesting, what’s an example of something to do with it that I wouldn’t think of as an obvious consequence of breaking your reward signal into several channels?
I’m not suggesting that RL is the only, or even the best, way to develop AGI. But this is the approach being advocated by Silver et al, and given their standing in the research community, and the resources available to them at DeepMind, it would appear likely that they, and others, will probably try to develop AGI in this way.
Therefore I think it is essential that a multiobjective approach is taken for there to be any chance that this AGI will actually be aligned to our best interests. If conventional RL based on scalar reward is used then (a) it is very difficult to specify a suitable scalar reward which accounts for all of the many factors required for alignment (so reward misspecification becomes more likely), (b) it is very difficult, or perhaps impossible, for the RL agent to learn the policy which represents the optimal trade-off between those factors, and (c) the agent will be unable to learn about rewards other than those currently provided, meaning it will lack flexibility in adapting to changes in values (our own or society’s)
The multiobjective maximum expected utility (MOMEU) model is a general framework, and can be used in conjunction with other approaches to aligning AGI. For example, if we encode an ethical system as a rule-base, then the output of those rules can be used to derive one of the elements of the vector utility provided to the multi-objective agent. We also aren’t constrained to a single set of ethics—we could implement many different frameworks, treat each as a separate objective, and then when the frameworks disagree, the agent would aim to find the best compromise between those objectives.
While I didn’t touch on it in this post, other desirable aspects of beneficial AI (such as fairness) can also be naturally represented and implemented within a multiobjective framework.
I agree that us humans have a lot of information about human values that we want to be able to put into the AI in its architecture and in the design of its training process. But I don’t see why multi-objective RL is particularly interesting. Do you think we won’t need any other ways of giving the AI information about human values? If so, why? If not, and you just think it’s interesting, what’s an example of something to do with it that I wouldn’t think of as an obvious consequence of breaking your reward signal into several channels?
I’m not suggesting that RL is the only, or even the best, way to develop AGI. But this is the approach being advocated by Silver et al, and given their standing in the research community, and the resources available to them at DeepMind, it would appear likely that they, and others, will probably try to develop AGI in this way.
Therefore I think it is essential that a multiobjective approach is taken for there to be any chance that this AGI will actually be aligned to our best interests. If conventional RL based on scalar reward is used then
(a) it is very difficult to specify a suitable scalar reward which accounts for all of the many factors required for alignment (so reward misspecification becomes more likely),
(b) it is very difficult, or perhaps impossible, for the RL agent to learn the policy which represents the optimal trade-off between those factors, and
(c) the agent will be unable to learn about rewards other than those currently provided, meaning it will lack flexibility in adapting to changes in values (our own or society’s)
The multiobjective maximum expected utility (MOMEU) model is a general framework, and can be used in conjunction with other approaches to aligning AGI. For example, if we encode an ethical system as a rule-base, then the output of those rules can be used to derive one of the elements of the vector utility provided to the multi-objective agent. We also aren’t constrained to a single set of ethics—we could implement many different frameworks, treat each as a separate objective, and then when the frameworks disagree, the agent would aim to find the best compromise between those objectives.
While I didn’t touch on it in this post, other desirable aspects of beneficial AI (such as fairness) can also be naturally represented and implemented within a multiobjective framework.