I don’t know whether Hutter ever told Eliezer that “AIXI would kill off its users and seize control of its reward button,” but he does say the following in his book (pp. 238-239):
Another problem connected, but possibly not limited to embodied agents, especially if they are rewarded by humans, is the following: Sufficiently intelligent agents may increase their rewards by psychologically manipulating their human “teachers”, or by threatening them… Every intelligence superior to humans is capable of manipulating the latter. In the absence of manipulable humans, e.g. where the reward structure serves a survival function, AIXI may directly hack into its reward feedback. Since this is unlikely to increase its long-term survival, AIXI will probably resist this kind of manipulation (just as most humans don’t take hard drugs, because of their long-term catastrophic consequences).
I don’t know whether Hutter ever told Eliezer that “AIXI would kill off its users and seize control of its reward button,” but he does say the following in his book (pp. 238-239):
This issue is discussed at greater length, and with greater formality, in Dewey (2011) and Ring & Orseau (2011).