For purposes of alignment, it’s probably more important that there is not going to be a known aligned utility function over all feasible plans. It’s relatively easy to arrange the world in a way that’s too hard to judge the value of. Within an island of more well-understood plans whose value can be judged in an aligned way, this value might have the form of expectation of a utility function. But this is less urgently salient, because naive optimization will quickly move the state of the world away from that island. Preserving a utility function doesn’t help with this problem.
For purposes of alignment, it’s probably more important that there is not going to be a known aligned utility function over all feasible plans. It’s relatively easy to arrange the world in a way that’s too hard to judge the value of. Within an island of more well-understood plans whose value can be judged in an aligned way, this value might have the form of expectation of a utility function. But this is less urgently salient, because naive optimization will quickly move the state of the world away from that island. Preserving a utility function doesn’t help with this problem.