Would an AUP agent ever want to self-modify to get rid of it’s penalty and just keep it’s utility function?
I’m a little confused on my question: I’m not sure if this is included in the wireheading objection, the embedded agency flaw, or some third alternative.
No, for the same reason normal maximizers generally don’t choose to modify their goals into something totally different: doing so leads to different, less-desirable optimization occurring. See: basic AI drive 3 or Self-Modification of Policy and Utility Function in Rational Agents.
Food for thought: why do we not suspect it would just prefer to keep the penalty term, discarding the utility? Both contribute to the composite, after all.
From a basic understanding of Hutter’s paper that you linked, agents will not self-modify if it affects their utility function (because that plan produces less original utility).
Re-reading your post:
This isn’t a penalty “in addition” to what the agent “really wants”; u′A (and in a moment, the slightly-improved u′′A) is what evaluates outcomes.
Clearly states that the penalty is part of the utility function that the agent will “really want”
Would an AUP agent ever want to self-modify to get rid of it’s penalty and just keep it’s utility function?
I’m a little confused on my question: I’m not sure if this is included in the wireheading objection, the embedded agency flaw, or some third alternative.
No, for the same reason normal maximizers generally don’t choose to modify their goals into something totally different: doing so leads to different, less-desirable optimization occurring. See: basic AI drive 3 or Self-Modification of Policy and Utility Function in Rational Agents.
Food for thought: why do we not suspect it would just prefer to keep the penalty term, discarding the utility? Both contribute to the composite, after all.
From a basic understanding of Hutter’s paper that you linked, agents will not self-modify if it affects their utility function (because that plan produces less original utility).
Re-reading your post:
Clearly states that the penalty is part of the utility function that the agent will “really want”