A second, maybe more pertinent point: unlosing agents are also unexploitable (maybe I should have called them that to begin with). This is a very useful thing for any agent to be, especially one who’s values are not yet jelled (just as, eg, a FAI still in the process of estimating the CEV).
I don’t see how unlosing agents are compatible with CEV. Running the unlosing algorithm gives you one utility function at the end, and running CEV gives you another. They would be the same only by coincidence. If you start by giving control to the unlosing algorithm, why would it then hand over control to CEV or change its utility function to CEV’s output (or not remove whatever hardwired switchover mechanism you might put in)? Your third comment seems make essentially the same point as your second (this) comment, and the same response seems to apply.
Before running CEV, we are going to have to make a lot of decisions about what constitutes a human utility function, how to extract it, how to assess strength of opinions, how to resolve conflicts between different preferences in the same person, how to aggregate it, and so on. So the ultimate CEV is path dependent, dependent on the outcome of those choices.
Using an unlosing algorithm could be seen as starting on the path to CEV earlier, before humans have made all those decisions, and letting the algorithm solve make some of those decisions rather than ourselves. This could be useful if some of the components on the path to CEV are things where our meta-decision skills (for instructing the unlosing agent how to resolve these issues) are better than our object level decision skills (for resolving the issues directly).
A second, maybe more pertinent point: unlosing agents are also unexploitable (maybe I should have called them that to begin with). This is a very useful thing for any agent to be, especially one who’s values are not yet jelled (just as, eg, a FAI still in the process of estimating the CEV).
I don’t see how unlosing agents are compatible with CEV. Running the unlosing algorithm gives you one utility function at the end, and running CEV gives you another. They would be the same only by coincidence. If you start by giving control to the unlosing algorithm, why would it then hand over control to CEV or change its utility function to CEV’s output (or not remove whatever hardwired switchover mechanism you might put in)? Your third comment seems make essentially the same point as your second (this) comment, and the same response seems to apply.
Before running CEV, we are going to have to make a lot of decisions about what constitutes a human utility function, how to extract it, how to assess strength of opinions, how to resolve conflicts between different preferences in the same person, how to aggregate it, and so on. So the ultimate CEV is path dependent, dependent on the outcome of those choices.
Using an unlosing algorithm could be seen as starting on the path to CEV earlier, before humans have made all those decisions, and letting the algorithm solve make some of those decisions rather than ourselves. This could be useful if some of the components on the path to CEV are things where our meta-decision skills (for instructing the unlosing agent how to resolve these issues) are better than our object level decision skills (for resolving the issues directly).