I think DWIM is underspecified in that it doesn’t say how much the agent hates to get it wrong. With enough aversion to dramatic failure, you get a lot of the caution you mention for corrigibility. I think corrigibility might have the same issue.
As for what it would think about, that would eppend on all of the previous instructions it’s trying to follow. It would probably think about how to get better at following some.of those in particular or likely future instructions in general.
DWIM requires some real thought from the principal, but given that, I think the instructions would probably add up to something very like corrigibility. So I think much less about the difference between them and much more about how to technically implement either of them, and get the people creating AGI to put it into practice.
I think DWIM is underspecified in that it doesn’t say how much the agent hates to get it wrong. With enough aversion to dramatic failure, you get a lot of the caution you mention for corrigibility. I think corrigibility might have the same issue.
As for what it would think about, that would eppend on all of the previous instructions it’s trying to follow. It would probably think about how to get better at following some.of those in particular or likely future instructions in general.
DWIM requires some real thought from the principal, but given that, I think the instructions would probably add up to something very like corrigibility. So I think much less about the difference between them and much more about how to technically implement either of them, and get the people creating AGI to put it into practice.