Late response, but I think that Adam Shimi’s Unbounded Atomic Optimization does a good job at a unified frame on alignment failures.
I tend to think that decision theory failures are not a primary worry for why we might land in AGI ruin, the AI having a poor decision theory is more of a capabilities problem. Afaik, the motivation behind studying decision theory is to have a better understanding of agency (and related concepts like counterfactual reasoning, logical updatelessness, etc) at a basic level.
Late response, but I think that Adam Shimi’s Unbounded Atomic Optimization does a good job at a unified frame on alignment failures.
I tend to think that decision theory failures are not a primary worry for why we might land in AGI ruin, the AI having a poor decision theory is more of a capabilities problem. Afaik, the motivation behind studying decision theory is to have a better understanding of agency (and related concepts like counterfactual reasoning, logical updatelessness, etc) at a basic level.