Stuart Armstrong wrote a post that argued for merged utility functions of this form (plus a tie-breaker), but there are definitely things, like different priors and logical uncertainty, which the argument doesn’t take into account, that make it unclear what the actual form of the utility function would be (or if the merged AI would even be doing expected utility maximization). I’m curious what your own reason for doubting it is.
One utility function might turn out much easier to optimize than the other, in which case the harder-to-optimize one will be ignored completely. Random events might influence which utility function is harder to optimize, so one can’t necessarily tune λ in advance to try to take this into account.
One of the reasons was the problem of positive affine scaling preserving behavior, but I see Stuart addresses that.
And actually, some of the reasons for thinking there would be more complicated mixing are going away as I think about it more.
EDIT: yeah if they had the same priors and did unbounded reasoning, I wouldn’t be surprised anymore if there exists a λ that they would agree to.
Stuart Armstrong wrote a post that argued for merged utility functions of this form (plus a tie-breaker), but there are definitely things, like different priors and logical uncertainty, which the argument doesn’t take into account, that make it unclear what the actual form of the utility function would be (or if the merged AI would even be doing expected utility maximization). I’m curious what your own reason for doubting it is.
One utility function might turn out much easier to optimize than the other, in which case the harder-to-optimize one will be ignored completely. Random events might influence which utility function is harder to optimize, so one can’t necessarily tune λ in advance to try to take this into account.
One of the reasons was the problem of positive affine scaling preserving behavior, but I see Stuart addresses that.
And actually, some of the reasons for thinking there would be more complicated mixing are going away as I think about it more.
EDIT: yeah if they had the same priors and did unbounded reasoning, I wouldn’t be surprised anymore if there exists a λ that they would agree to.