My take is that corrigibility is sufficient to get you an AI that understands what it means to “keep improving their understanding of Alice’s values and to serve those values”. I don’t think the AI needs to play the “genius philosopher” role, just the “loyal and trustworthy servant” role. A superintelligent AI which plays that role should be able to facilitate a “long reflection” where flesh and blood humans solve philosophical problems.
(I also separately think unsupervised learning systems could in principle make philosophical breakthroughs. Maybe one already has.)
My take is that corrigibility is sufficient to get you an AI that understands what it means to “keep improving their understanding of Alice’s values and to serve those values”. I don’t think the AI needs to play the “genius philosopher” role, just the “loyal and trustworthy servant” role. A superintelligent AI which plays that role should be able to facilitate a “long reflection” where flesh and blood humans solve philosophical problems.
(I also separately think unsupervised learning systems could in principle make philosophical breakthroughs. Maybe one already has.)