Hi, for some reason I didn’t see this reply until recently.
metaethical.ai is the most sophisticated sketch I’ve seen, of how to make human-friendly AI. In my personal historiography of “friendliness theory”, the three milestones so far are Yudkowsky 2004 (Coherent Extrapolated Volition), Christiano 2016 (alignment via capability amplification), and June Ku 2019 (“AIXI for Friendliness”).
To me, it’s conceivable that the metaethical.ai schema is sufficient to solve the problem. It is an idealization (“we suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available”), but surely a bounded version that uses heuristic models can be realized.
Hi, for some reason I didn’t see this reply until recently.
metaethical.ai is the most sophisticated sketch I’ve seen, of how to make human-friendly AI. In my personal historiography of “friendliness theory”, the three milestones so far are Yudkowsky 2004 (Coherent Extrapolated Volition), Christiano 2016 (alignment via capability amplification), and June Ku 2019 (“AIXI for Friendliness”).
To me, it’s conceivable that the metaethical.ai schema is sufficient to solve the problem. It is an idealization (“we suppose that unlimited computation and a complete low-level causal model of the world and the adult human brains in it are available”), but surely a bounded version that uses heuristic models can be realized.
Thanks! FWIW your high opinion of the project counts for a lot with me; I will allocate more attention to it and seriously consider donating.