To say “corrigibility is a broad basin of attraction”, you need ALL of the following to be true:
At some point, either in an in-person conversation or a post, Paul clarified that obviously it will be ‘narrow’ in some dimensions and ‘broad’ in others. I think it’s not obvious how the geometric intuition goes here, and this question mostly hinges on “if you have some parts of corrigibility, do you get the other parts?”, to which I think “no” and Paul seems to think “yes.” [He might think some limited version of that, like “it makes it easier to get the other parts.” which I still don’t buy yet.]
At some point, either in an in-person conversation or a post, Paul clarified that obviously it will be ‘narrow’ in some dimensions and ‘broad’ in others. I think it’s not obvious how the geometric intuition goes here, and this question mostly hinges on “if you have some parts of corrigibility, do you get the other parts?”, to which I think “no” and Paul seems to think “yes.” [He might think some limited version of that, like “it makes it easier to get the other parts.” which I still don’t buy yet.]