Even if the effect of it is “limiting power-seeking”, I suspect this to be a poor frame for actually coming up with a solution, because this is defined purely in the negative, and not even in the negative of something we want to avoid, but instead in the negative of something we often want to achieve. Rather, one should come to understand what kind of power seeking we want to limit.
Corrigibility does not necessarily mean limiting power-seeking much. You could have an AI that is corrigible not because it doesn’t accumulate a bunch of resources and build up powerful infrastructure, but instead because it voluntarily avoids using this infrastructure against the people it tries to be corrigible to.
I don’t totally disagree, but two points:
Even if the effect of it is “limiting power-seeking”, I suspect this to be a poor frame for actually coming up with a solution, because this is defined purely in the negative, and not even in the negative of something we want to avoid, but instead in the negative of something we often want to achieve. Rather, one should come to understand what kind of power seeking we want to limit.
Corrigibility does not necessarily mean limiting power-seeking much. You could have an AI that is corrigible not because it doesn’t accumulate a bunch of resources and build up powerful infrastructure, but instead because it voluntarily avoids using this infrastructure against the people it tries to be corrigible to.