John_Maxwell comments on Alignment By Default

John_Maxwell 15 Aug 2020 11:45 UTC
LW: 2 AF: 1
AF
My take is that corrigibility is sufficient to get you an AI that understands what it means to “keep improving their understanding of Alice’s values and to serve those values”. I don’t think the AI needs to play the “genius philosopher” role, just the “loyal and trustworthy servant” role. A superintelligent AI which plays that role should be able to facilitate a “long reflection” where flesh and blood humans solve philosophical problems.

(I also separately think unsupervised learning systems could in principle make philosophical breakthroughs. Maybe one already has.)