Thomas Larsen answers Is there an unified way to make sense of ai failure modes?

Thomas Larsen 27 Aug 2022 0:51 UTC
1 point
Late response, but I think that Adam Shimi’s Unbounded Atomic Optimization does a good job at a unified frame on alignment failures.

I tend to think that decision theory failures are not a primary worry for why we might land in AGI ruin, the AI having a poor decision theory is more of a capabilities problem. Afaik, the motivation behind studying decision theory is to have a better understanding of agency (and related concepts like counterfactual reasoning, logical updatelessness, etc) at a basic level.