I’m less sure exactly how broadly applicable the argument is. For example, I haven’t thought as much about approaches that rely much more on corrigibility and less on having a good model of human values, though my best guess is that something similar still applies. My main goal here was to get across any intuition at all for why we might expect ontology identification to be a central obstacle.
My guess is it applies way less—a lot of the hope is that the basin of corrigibility is “large enough” that it should be obvious what to do, even if you can’t infer the whole human value function.
My guess is it applies way less—a lot of the hope is that the basin of corrigibility is “large enough” that it should be obvious what to do, even if you can’t infer the whole human value function.