johnswentworth comments on Classification of AI alignment research: deconfusion, “good enough” non-superintelligent AI alignment, superintelligent AI alignment

johnswentworth 15 Jul 2020 21:19 UTC
5 points
Re: embedded agency, while these are all potentially relevant points (especially self-modification), I don’t see any of them as the main reason to study embedded agents from an alignment standpoint. I see the main purpose of embedded agency research as talking about humans, not designing AIs—in particular, in order to point to human values, we need a coherent notion of what it means for an agenty system embedded in its environment (i.e. a human) to want things. As the linked post discusses, a lot of the issues with modelling humans as utility-maximizers or using proxies for our goals stem directly from more general embedded agency issues.