TurnTrout comments on Wireheading as a potential problem with the new impact measure

TurnTrout 27 Sep 2018 1:34 UTC
LW: 1 AF: 1
AF
Shouldn’t this be high penalty, though? It impedes the agent’s ability to not act in the future.
- Stuart_Armstrong 28 Sep 2018 7:36 UTC
  LW: 2 AF: 1
  AF Parent
  It is high penalty, by the definition, but because the scrambler is deterministic and known, that agent can choose to “not act” (have $\emptyset$ reach the outer environment) without any difficulty, by choosing the right action $a^{j}$ at each time step. It’s just that the penalty now no longer encodes that intuitive version of “not acting”.
  - TurnTrout 28 Sep 2018 13:35 UTC
    LW: 3 AF: 2
    AF Parent
    This is confusing “do what we mean” with “do what we programmed”. Executing this action changes its ability to actually follow the programmed “do nothing” plan in the future. Remember, we assumed a privileged null action. If this only swapped the other actions, it would cause ~0 penalty.
    - Stuart_Armstrong 1 Oct 2018 11:44 UTC
      LW: 3 AF: 2
      AF Parent
      That is a valid point. So you see the high impact in the scrambler as “messing up the ability to correctly measure low impact”.
      
      That is interesting, but I’d note that the scrambler can be measured to have a large impact even if the agent ultimately has a low impact. It suggests that this impact measure is measuring something subtly different from what we think it is.
      
      But I won’t belabour the point because this does not seem to be a failure mode for the agent. Measuring something low impact as high impact is not conceptually clean, but won’t cause bad behaviour, so far as I can see (except maybe under blackmail “I’ll prevent the scrambler from being turned on if you give me some utility”).