Yea, thanks for remembering me! You can also posit that the agent is omniscient from the start, so it did not change its policy due to learning. This argument proves that an agent cannot be corrigible and a maximizer of the same expected utility funtion of world states over multiple shutdowns. But still leaves the possibility for the agent to be corrigible while rewriting his utility function after every correction.
Yea, thanks for remembering me! You can also posit that the agent is omniscient from the start, so it did not change its policy due to learning. This argument proves that an agent cannot be corrigible and a maximizer of the same expected utility funtion of world states over multiple shutdowns. But still leaves the possibility for the agent to be corrigible while rewriting his utility function after every correction.