Existing multi-objective reinforcement learning (MORL) algorithms do not account for objectives that arise from players with differing beliefs.Concretely, consider two players with different beliefs and utility functions who may cooperate to build a machine that takes actions on their behalf. A representation is needed for how much the machine’s policy will prioritize each player’s interests over time. Assuming the players have reached common knowledge of their situation, this paper derives a recursion that any Pareto optimal policy must satisfy. Two qualitative observations can be made from the recursion: the machine must (1) use each player’s own beliefs in evaluating how well an action will serve that player’s utility function, and (2) shift the relative priority it assigns to each player’s expected utilities over time, by a factor proportional to how well that player’s beliefs predict the machine’s inputs. Observation (2) represents a substantial divergence from naive linear utility aggregation (as in Harsanyi’s utilitarian theorem, and existing MORL algorithms), which is shown here to be inadequate for Pareto optimal sequential decision-making on behalf of players with different beliefs.
Stuart Armstrong wrote a post that argued for merged utility functions of this form (plus a tie-breaker), but there are definitely things, like different priors and logical uncertainty, which the argument doesn’t take into account, that make it unclear what the actual form of the utility function would be (or if the merged AI would even be doing expected utility maximization). I’m curious what your own reason for doubting it is.
One utility function might turn out much easier to optimize than the other, in which case the harder-to-optimize one will be ignored completely. Random events might influence which utility function is harder to optimize, so one can’t necessarily tune λ in advance to try to take this into account.
One of the reasons was the problem of positive affine scaling preserving behavior, but I see Stuart addresses that.
And actually, some of the reasons for thinking there would be more complicated mixing are going away as I think about it more.
EDIT: yeah if they had the same priors and did unbounded reasoning, I wouldn’t be surprised anymore if there exists a λ that they would agree to.
Have you thought at all about what merged utility function two AI’s would agree on? I doubt it would be of the form λU1+(1−λ)U2.
Critch wrote a related paper:
Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making
Stuart Armstrong wrote a post that argued for merged utility functions of this form (plus a tie-breaker), but there are definitely things, like different priors and logical uncertainty, which the argument doesn’t take into account, that make it unclear what the actual form of the utility function would be (or if the merged AI would even be doing expected utility maximization). I’m curious what your own reason for doubting it is.
One utility function might turn out much easier to optimize than the other, in which case the harder-to-optimize one will be ignored completely. Random events might influence which utility function is harder to optimize, so one can’t necessarily tune λ in advance to try to take this into account.
One of the reasons was the problem of positive affine scaling preserving behavior, but I see Stuart addresses that.
And actually, some of the reasons for thinking there would be more complicated mixing are going away as I think about it more.
EDIT: yeah if they had the same priors and did unbounded reasoning, I wouldn’t be surprised anymore if there exists a λ that they would agree to.