We aren’t working on decision theory in order to make sure that AGI systems are decision-theoretic, whatever that would involve. We’re working on decision theory because there’s a cluster of confusing issues here (e.g., counterfactuals, updatelessness, coordination) that represent a lot of holes or anomalies in our current best understanding of what high-quality reasoning is and how it works.
Gaining information about the nature of rationality (e.g., is “realism about rationality” true?) and the nature of philosophy (e.g., is it possible to make real progress in decision theory, and if so what cognitive processes are we using to do that?), and helping to solve the problems of normativity, meta-ethics, and metaphilosophy.
Better understanding potential AI safety failure modes that are due to flawed decision procedures implemented in or by AI.
Making progress on various seemingly important intellectual puzzles that seem directly related to decision theory, such as free will, anthropic reasoning, logical uncertainty, Rob’s examples of counterfactuals, updatelessness, and coordination, and more.
Thanks for the answer. It clarifies a little bit, but I still feel like I don’t fully grasp its relevance to alignment. I have the impression that there’s more to the story than just that?
Decision theory is a term used in mathematical statistics and philosophy. In applied AI terms, a decision theory is the algorithm used by an AI agent to compute what action to take next. The nature of this algorithm is obviously relevant to alignment. That being said, philosophers like to argue among themselves about different decision theories and how they relate to certain paradoxes and limit cases, and they conduct these arguments using a terminology that is entirely disconnected from that being used in most theoretical and applied AI research. Not all AI alignment researchers believe that these philosophical arguments are very relevant to moving forward AI alignment research.
AI’s may make decisions in ways that we find counterintuitive. These ways are likely to be shaped to be highly effective (because being effective is instrumentally useful to many goals), so understanding the space of highly effective decision theories is one way to think about how advanced AI’s will reason without having one in front of us right now.
Why is research into decision theories relevant to alignment?
From Comment on decision theory:
From On the purposes of decision theory research:
Thanks for the answer. It clarifies a little bit, but I still feel like I don’t fully grasp its relevance to alignment. I have the impression that there’s more to the story than just that?
Decision theory is a term used in mathematical statistics and philosophy. In applied AI terms, a decision theory is the algorithm used by an AI agent to compute what action to take next. The nature of this algorithm is obviously relevant to alignment. That being said, philosophers like to argue among themselves about different decision theories and how they relate to certain paradoxes and limit cases, and they conduct these arguments using a terminology that is entirely disconnected from that being used in most theoretical and applied AI research. Not all AI alignment researchers believe that these philosophical arguments are very relevant to moving forward AI alignment research.
AI’s may make decisions in ways that we find counterintuitive. These ways are likely to be shaped to be highly effective (because being effective is instrumentally useful to many goals), so understanding the space of highly effective decision theories is one way to think about how advanced AI’s will reason without having one in front of us right now.