That is a valid point. So you see the high impact in the scrambler as “messing up the ability to correctly measure low impact”.
That is interesting, but I’d note that the scrambler can be measured to have a large impact even if the agent ultimately has a low impact. It suggests that this impact measure is measuring something subtly different from what we think it is.
But I won’t belabour the point because this does not seem to be a failure mode for the agent. Measuring something low impact as high impact is not conceptually clean, but won’t cause bad behaviour, so far as I can see (except maybe under blackmail “I’ll prevent the scrambler from being turned on if you give me some utility”).
That is a valid point. So you see the high impact in the scrambler as “messing up the ability to correctly measure low impact”.
That is interesting, but I’d note that the scrambler can be measured to have a large impact even if the agent ultimately has a low impact. It suggests that this impact measure is measuring something subtly different from what we think it is.
But I won’t belabour the point because this does not seem to be a failure mode for the agent. Measuring something low impact as high impact is not conceptually clean, but won’t cause bad behaviour, so far as I can see (except maybe under blackmail “I’ll prevent the scrambler from being turned on if you give me some utility”).