Insofar as “shut down is bad” is a mistaken belief, we expect the problem of updated deference to dissolve as AI capabilities grow, since more capable AIs will make fewer mistakes.
The problem is that what counts as mistaken is partially encoded in AI’s meta-utility function and not only in it’s epistemic capabilities. So it wouldn’t be a mistake from AI’s point of view. If we knew how to encode real-in-the-limit values we wouldn’t need shutdownability.
The problem is that what counts as mistaken is partially encoded in AI’s meta-utility function and not only in it’s epistemic capabilities. So it wouldn’t be a mistake from AI’s point of view. If we knew how to encode real-in-the-limit values we wouldn’t need shutdownability.