Note that IRL is invariant to translating a possible utility function by a constant. So this kind of normalization doesn’t have to be baked into the algorithm.
This is true.
The most natural normalization procedure is to look at how the human is trying or not trying to affect the event X (as I said in the second part of my comment). If the human never tries to affect X either way, then the AI will normalize the utility functions so that the AI has no incentive to affect X either.
Note that IRL is invariant to translating a possible utility function by a constant. So this kind of normalization doesn’t have to be baked into the algorithm.
This is true.
The most natural normalization procedure is to look at how the human is trying or not trying to affect the event X (as I said in the second part of my comment). If the human never tries to affect X either way, then the AI will normalize the utility functions so that the AI has no incentive to affect X either.