So by ‘root utility function’, I meant something like the result of using a superintelligent world model or oracle to predict possible futures, and then allowing the human to explore those futures and ultimately preference rank them.
So we don’t get to edit our root utility function—which is not to say we could not in theory with some hypothetical future operation as you mention—but we don’t in practice, and most would not want to.
Morality/ethics is more like an attempt to negotiate some set of cooperative instrumental values and is only loosely related to our root utility function in the sense that it ultimately steers everything.
In the case of AGI—because there’s a strong case that the values an AGI develops as a default are misaligned with what we—and potential future people (would) care about. And because some values likely will get locked in via AGI, it’s just a question of which.
That is not argument for locking in values, it is an argument against. But thankfully it is not all a given that values will get locked in. Human values seem to evolve slowly over time. A successfully aligned AGI will either model that evolution correctly (as in brain-like AGI and/or successful value learning), or be largely immune to it (through safe bounding via external empowerment for example) or utility uncertainty. There are numerous potential paths to the goal that don’t involve any value lock in (which could be disastrous).
For the same reason the Founding Fathers wrote the constitution. The level to which something is locked in is a spectrum. Will MacAskill essentially suggests “locking in” the value of epistemic & moral humility,
To the limited extent that makes sense to me, it does so as a non-technical vague analogy to utility uncertainty.
Therefore, the question to what extent we want to lock in our present values
There is only one thing we want to lock in: optimization aligned with our true unknown dynamic terminal utility function.
So by ‘root utility function’, I meant something like the result of using a superintelligent world model or oracle to predict possible futures, and then allowing the human to explore those futures and ultimately preference rank them.
So we don’t get to edit our root utility function—which is not to say we could not in theory with some hypothetical future operation as you mention—but we don’t in practice, and most would not want to.
Morality/ethics is more like an attempt to negotiate some set of cooperative instrumental values and is only loosely related to our root utility function in the sense that it ultimately steers everything.
That is not argument for locking in values, it is an argument against. But thankfully it is not all a given that values will get locked in. Human values seem to evolve slowly over time. A successfully aligned AGI will either model that evolution correctly (as in brain-like AGI and/or successful value learning), or be largely immune to it (through safe bounding via external empowerment for example) or utility uncertainty. There are numerous potential paths to the goal that don’t involve any value lock in (which could be disastrous).
To the limited extent that makes sense to me, it does so as a non-technical vague analogy to utility uncertainty.
There is only one thing we want to lock in: optimization aligned with our true unknown dynamic terminal utility function.