Even after thinking through these issues in SERI-MATS, and already agreeing with at least most of this post, I was surprised upon reading it how many new-or-newish-to-me ideas and links it contained.
I’m not sure if that’s more of a failure of me, or of the alignment field to notice “things that are common between a diverse array of problems faced”. Kind of related to my hunch that multiple alignment concepts (“goals”, “boundaries”, “optimization”) will turn out to be isomorphic to the same tiny-handful of mathematical objects.
Even after thinking through these issues in SERI-MATS, and already agreeing with at least most of this post, I was surprised upon reading it how many new-or-newish-to-me ideas and links it contained.
I’m not sure if that’s more of a failure of me, or of the alignment field to notice “things that are common between a diverse array of problems faced”. Kind of related to my hunch that multiple alignment concepts (“goals”, “boundaries”, “optimization”) will turn out to be isomorphic to the same tiny-handful of mathematical objects.