Signer comments on Updated Deference is not a strong argument against the utility uncertainty approach to alignment

Signer 24 Jun 2022 20:19 UTC
1 point

Insofar as “shut down is bad” is a mistaken belief, we expect the problem of updated deference to dissolve as AI capabilities grow, since more capable AIs will make fewer mistakes.

The problem is that what counts as mistaken is partially encoded in AI’s meta-utility function and not only in it’s epistemic capabilities. So it wouldn’t be a mistake from AI’s point of view. If we knew how to encode real-in-the-limit values we wouldn’t need shutdownability.