Now that I understand “corrigible” isn’t synonymous with “satisfying my short-term preferences-on-reflection”, “corrigibility is relatively easy to learn” doesn’t seem enough to imply these things
I agree that you still need the AI to be trying to do the right thing (even though we don’t e.g. have any clear definition of “the right thing”), and that seems like the main way that you are going to fail.
I agree that you still need the AI to be trying to do the right thing (even though we don’t e.g. have any clear definition of “the right thing”), and that seems like the main way that you are going to fail.