We have argued that the reinforcement-learning, goal-seeking and predictionseeking
agents all take advantage of the realistic opportunity to modify their
inputs right before receiving them. This behavior is undesirable as the agents
no longer maximize their utility with respect to the true (inner) environment
but instead become mere survival agents, trying only to avoid those dangerous
states where their code could be modified by the environment.
Here’s one mathematical argument for it, based on the assumption that the AI can rewire its reward channel but not the whole reward/planning function: http://www.agroparistech.fr/mmip/maths/laurent_orseau/papers/ring-orseau-AGI-2011-delusion.pdf