Promoted to curated! This post is denser in math than what I would usually consider for curation, but I feel like this kind of topic is quite important and also of broader relevance than more ML-focused alignment work often is. I particularly like the set of careful definitions at the top in the TLDR. I am not sure how much they will hold up as I try to use them more in my thinking, but I already feel like they helped me understand the relevant concepts more.
Promoted to curated! This post is denser in math than what I would usually consider for curation, but I feel like this kind of topic is quite important and also of broader relevance than more ML-focused alignment work often is. I particularly like the set of careful definitions at the top in the TLDR. I am not sure how much they will hold up as I try to use them more in my thinking, but I already feel like they helped me understand the relevant concepts more.