Nicholas / Heather Kross comments on A Case for the Least Forgiving Take On Alignment

Nicholas / Heather Kross 30 Jun 2023 2:48 UTC
LW: 11 AF: 2
2
AF
Even after thinking through these issues in SERI-MATS, and already agreeing with at least most of this post, I was surprised upon reading it how many new-or-newish-to-me ideas and links it contained.

I’m not sure if that’s more of a failure of me, or of the alignment field to notice “things that are common between a diverse array of problems faced”. Kind of related to my hunch that multiple alignment concepts (“goals”, “boundaries”, “optimization”) will turn out to be isomorphic to the same tiny-handful of mathematical objects.