It is high penalty, by the definition, but because the scrambler is deterministic and known, that agent can choose to “not act” (have ∅ reach the outer environment) without any difficulty, by choosing the right action aj at each time step. It’s just that the penalty now no longer encodes that intuitive version of “not acting”.
This is confusing “do what we mean” with “do what we programmed”. Executing this action changes its ability to actually follow the programmed “do nothing” plan in the future. Remember, we assumed a privileged null action. If this only swapped the other actions, it would cause ~0 penalty.
That is a valid point. So you see the high impact in the scrambler as “messing up the ability to correctly measure low impact”.
That is interesting, but I’d note that the scrambler can be measured to have a large impact even if the agent ultimately has a low impact. It suggests that this impact measure is measuring something subtly different from what we think it is.
But I won’t belabour the point because this does not seem to be a failure mode for the agent. Measuring something low impact as high impact is not conceptually clean, but won’t cause bad behaviour, so far as I can see (except maybe under blackmail “I’ll prevent the scrambler from being turned on if you give me some utility”).
It is high penalty, by the definition, but because the scrambler is deterministic and known, that agent can choose to “not act” (have ∅ reach the outer environment) without any difficulty, by choosing the right action aj at each time step. It’s just that the penalty now no longer encodes that intuitive version of “not acting”.
This is confusing “do what we mean” with “do what we programmed”. Executing this action changes its ability to actually follow the programmed “do nothing” plan in the future. Remember, we assumed a privileged null action. If this only swapped the other actions, it would cause ~0 penalty.
That is a valid point. So you see the high impact in the scrambler as “messing up the ability to correctly measure low impact”.
That is interesting, but I’d note that the scrambler can be measured to have a large impact even if the agent ultimately has a low impact. It suggests that this impact measure is measuring something subtly different from what we think it is.
But I won’t belabour the point because this does not seem to be a failure mode for the agent. Measuring something low impact as high impact is not conceptually clean, but won’t cause bad behaviour, so far as I can see (except maybe under blackmail “I’ll prevent the scrambler from being turned on if you give me some utility”).