From a basic understanding of Hutter’s paper that you linked, agents will not self-modify if it affects their utility function (because that plan produces less original utility).
Re-reading your post:
This isn’t a penalty “in addition” to what the agent “really wants”; u′A (and in a moment, the slightly-improved u′′A) is what evaluates outcomes.
Clearly states that the penalty is part of the utility function that the agent will “really want”
From a basic understanding of Hutter’s paper that you linked, agents will not self-modify if it affects their utility function (because that plan produces less original utility).
Re-reading your post:
Clearly states that the penalty is part of the utility function that the agent will “really want”