What happens if you now enclose the computation of Pi in its utility-function? Would it reflect on this goal and try to figure out its true goals? Why would it do so, where does the incentive come from?
So: the general story is that to be able to optimise, agents have to build a model of the world—in order to predict the consequences of their possible actions. That model of the world will necessarily include a model of the agent—since it is an important part of its own local environment. That model of itself is likely to include its own goals—and it will use Occam’s razor to build a neat model of them. Thus goal reflection—Q.E.D.
So: the general story is that to be able to optimise, agents have to build a model of the world—in order to predict the consequences of their possible actions. That model of the world will necessarily include a model of the agent—since it is an important part of its own local environment. That model of itself is likely to include its own goals—and it will use Occam’s razor to build a neat model of them. Thus goal reflection—Q.E.D.