It’s somewhat more subtle than that. The ideal (and maybe impossible) corrigible AI should protect us even if we accidentally give the AI the wrong process for figuring out what to value. It should protect us even if the AI becomes omniscient.
If the AI knows vastly more than we do, there’s no sense in which we are providing extra evidence or an information-carrying “outside view”. We are instead just registering a sort of complaint and hoping we’ve programmed the AI to listen.
I’m still not convinced that such a sort of corrigibility is in any way distinct from some extra complications in the process we give the AI for figuring out what to value.
The outside view I had in mind wasn’t with respect to its knowledge, but to empirical data on how often its exact value-learning algorithm converges to the correct set of preferences for agents-like-us. That feels different.
It’s somewhat more subtle than that. The ideal (and maybe impossible) corrigible AI should protect us even if we accidentally give the AI the wrong process for figuring out what to value. It should protect us even if the AI becomes omniscient.
If the AI knows vastly more than we do, there’s no sense in which we are providing extra evidence or an information-carrying “outside view”. We are instead just registering a sort of complaint and hoping we’ve programmed the AI to listen.
I’m still not convinced that such a sort of corrigibility is in any way distinct from some extra complications in the process we give the AI for figuring out what to value.
The outside view I had in mind wasn’t with respect to its knowledge, but to empirical data on how often its exact value-learning algorithm converges to the correct set of preferences for agents-like-us. That feels different.