Logan Riggs comments on Towards a New Impact Measure

Logan Riggs Sep 19, 2018, 2:26 AM
1 point
Would an AUP agent ever want to self-modify to get rid of it’s penalty and just keep it’s utility function?
I’m a little confused on my question: I’m not sure if this is included in the wireheading objection, the embedded agency flaw, or some third alternative.
- TurnTrout Sep 19, 2018, 2:38 AM
  2 points
  Parent
  No, for the same reason normal maximizers generally don’t choose to modify their goals into something totally different: doing so leads to different, less-desirable optimization occurring. See: basic AI drive 3 or Self-Modification of Policy and Utility Function in Rational Agents.
  
  Food for thought: why do we not suspect it would just prefer to keep the penalty term, discarding the utility? Both contribute to the composite, after all.
  - Logan Riggs Sep 19, 2018, 3:04 AM
    2 points
    Parent
    From a basic understanding of Hutter’s paper that you linked, agents will not self-modify if it affects their utility function (because that plan produces less original utility).
    Re-reading your post:
    This isn’t a penalty “in addition” to what the agent “really wants”; u′A (and in a moment, the slightly-improved u′′A) is what evaluates outcomes.
    Clearly states that the penalty is part of the utility function that the agent will “really want”