This approach relies on having a process that reaches the desired conclusion, without specifying the desired conclusion. It’s a multi-armed bandit problem with not only the rewards uncertain, but the reward function uncertain. And it seems to rely on defining terms like “live, awake, sane, rational, well informed, adult, uncoerced”, which ain’t easy (though I have some developing ideas on how to do that for some of them).
Both your definition and corrigibility require human input. For your process, the AI has to assess what human input should be, at least as far as it has the power to influence future human input (see some of the issues with ). Corrigibility allows actual human input in many cases, without the AI doing any assessment.
Corrigibility is not needed if everything else is right; corrigibility is very useful if there might still be flaws in the AI’s design.
This approach relies on having a process that reaches the desired conclusion, without specifying the desired conclusion. It’s a multi-armed bandit problem with not only the rewards uncertain, but the reward function uncertain. And it seems to rely on defining terms like “live, awake, sane, rational, well informed, adult, uncoerced”, which ain’t easy (though I have some developing ideas on how to do that for some of them).
Both your definition and corrigibility require human input. For your process, the AI has to assess what human input should be, at least as far as it has the power to influence future human input (see some of the issues with ). Corrigibility allows actual human input in many cases, without the AI doing any assessment.
Corrigibility is not needed if everything else is right; corrigibility is very useful if there might still be flaws in the AI’s design.