Stuart_Armstrong comments on Wireheading as a potential problem with the new impact measure

Stuart_Armstrong 1 Oct 2018 11:44 UTC
LW: 3 AF: 2
AF
That is a valid point. So you see the high impact in the scrambler as “messing up the ability to correctly measure low impact”.

That is interesting, but I’d note that the scrambler can be measured to have a large impact even if the agent ultimately has a low impact. It suggests that this impact measure is measuring something subtly different from what we think it is.

But I won’t belabour the point because this does not seem to be a failure mode for the agent. Measuring something low impact as high impact is not conceptually clean, but won’t cause bad behaviour, so far as I can see (except maybe under blackmail “I’ll prevent the scrambler from being turned on if you give me some utility”).