Stuart_Armstrong comments on Wireheading as a potential problem with the new impact measure

Stuart_Armstrong 28 Sep 2018 7:36 UTC
LW: 2 AF: 1
AF
It is high penalty, by the definition, but because the scrambler is deterministic and known, that agent can choose to “not act” (have $\emptyset$ reach the outer environment) without any difficulty, by choosing the right action $a^{j}$ at each time step. It’s just that the penalty now no longer encodes that intuitive version of “not acting”.
- TurnTrout 28 Sep 2018 13:35 UTC
  LW: 3 AF: 2
  AF Parent
  This is confusing “do what we mean” with “do what we programmed”. Executing this action changes its ability to actually follow the programmed “do nothing” plan in the future. Remember, we assumed a privileged null action. If this only swapped the other actions, it would cause ~0 penalty.
  - Stuart_Armstrong 1 Oct 2018 11:44 UTC
    LW: 3 AF: 2
    AF Parent
    That is a valid point. So you see the high impact in the scrambler as “messing up the ability to correctly measure low impact”.
    
    That is interesting, but I’d note that the scrambler can be measured to have a large impact even if the agent ultimately has a low impact. It suggests that this impact measure is measuring something subtly different from what we think it is.
    
    But I won’t belabour the point because this does not seem to be a failure mode for the agent. Measuring something low impact as high impact is not conceptually clean, but won’t cause bad behaviour, so far as I can see (except maybe under blackmail “I’ll prevent the scrambler from being turned on if you give me some utility”).