I’m not sure what it means for this work to “not apply” to particular systems. It seems like the claim is that decision theory is a way to understand AI systems in general and reason about what they will do, just as we use other theoretical tools to understand current ML systems. Can you spell this out a bit more? (Note that I’m also not really sure what it means for decision theory to apply to all AI systems: I can imagine kludgy systems where it seems really hard in some sense to understand their behavior with decision theory, but I’m not confident at all)
I claim (with some confidence) that Updateless Decision Theory and Logical Induction don’t have much to do with understanding AlphaGo or OpenAI Five, and you are better off understanding those systems using standard AI/ML thinking.
I further claim (with less confidence) that in a similar way, at the time that we build our first powerful AI systems, the results of Agent Foundations research at that time won’t have much to do with understanding those powerful AI systems.
Does that explain what it means? And if so, do you disagree with either of the claims?
I claim (with some confidence) that Updateless Decision Theory and Logical Induction don’t have much to do with understanding AlphaGo or OpenAI Five, and you are better off understanding those systems using standard AI/ML thinking.
Eh, this is true, but it’s also true that causal decision theory, game theory, and probability theory have a lot to do with how to understand how to build AlphaZero or OpenAI Five (and by extension, those systems themselves). I think the relevant question here must be whether you think the embedded agency program can succeed as much as the classical decision theory/probability theory program, and whether conditional on that success it can be as influential (probably with a shorter lag between the program succeeding and wanting to influence AI development).
Yeah, my second claim is intended to include that scenario as well. That is, if embedded agency succeeded and significantly influenced the development of the first powerful AI systems, I would consider my second claim to be false.
This scenario (of embedded agency influencing AI development) would surprise me conditional on short timelines. Conditional on long timelines, I’m not sure, and would want to think about it more.
Note also that in a world where you can’t build powerful AI without Agent Foundations, it’s not a big loss if you don’t work on Agent Foundations right now. The worry is in a world where you can build powerful AI without Agent Foundations, but it leads to catastrophe. I’m focusing on the worlds in which that is true and in which powerful AI is developed soon.
That is all sensible, I was just slightly annoyed by what I read as an implication that “AlphaGo doesn’t use UDT therefore advanced AI won’t” or something.
I agree with both your claims, but maybe with less confidence than you (I also agree with DanielFilan’s point below).
Here are two places I can imagine MIRI’s intuitions here coming from, and I’m interested in your thoughts on them:
(1) The “idealized reasoner is analogous to a Carnot engine” argument. It seems like you think advanced AI systems will be importantly disanalogous to this idea, and that’s not obvious to me.
(2) ‘We might care about expected utility maximization / theoretical rationality because there is an important sense in which you are less capable / dumber / irrational if e.g. you are susceptible to money pumps. So advanced agents, since they are advanced, will act closer to ideal agents.’
(I don’t have much time to comment so sorry if the above is confusing)
(1) I am unsure whether there exists an idealized reasoner analogous to a Carnot engine (see Realism about rationality). Even if such a reasoner exists, it seems unlikely that we will a) figure out what it is, b) understand it in sufficient depth, and c) successfully use it to understand and improve ML techniques, before we get powerful AI systems through other means. Under short timelines, this cuts particularly deeply, because a) there’s less time to do all of these things and b) it’s more likely that advanced AI is built out of “messy” deep learning systems that seem less amenable to this sort of theoretical understanding.
(2) I certainly agree that all else equal, advanced agents should act closer to ideal agents. (Assuming there is such a thing as an ideal agent.) I also agree that advanced AI should be less susceptible to money pumps, from which I learn that their “preferences” (i.e. world states that they work to achieve) are transitive. I’m also on board that more advanced AI systems are more likely to be described by some utility function that they are maximizing the expected utility of, per the VNM theorem. I don’t agree that the utility function must be simple, or that the AI must be internally reasoning by computing the expected utility over all actions and then choosing the one that’s highest. I would be extremely surprised if we built powerful AI such that when we say the English sentence “make paperclips” it acts in accordance with the utility function U(universe history) = number of paperclips in the last state of the universe history. I would be very surprised if we built powerful AI such that we hardcode in the above utility function and then design the AI to maximize its expected value.
The Value Learning sequence expands on position 2, especially in Chapter 2. The conclusion is a short version of it, but still longer than the parent comment.
I’m not sure what it means for this work to “not apply” to particular systems. It seems like the claim is that decision theory is a way to understand AI systems in general and reason about what they will do, just as we use other theoretical tools to understand current ML systems. Can you spell this out a bit more? (Note that I’m also not really sure what it means for decision theory to apply to all AI systems: I can imagine kludgy systems where it seems really hard in some sense to understand their behavior with decision theory, but I’m not confident at all)
I claim (with some confidence) that Updateless Decision Theory and Logical Induction don’t have much to do with understanding AlphaGo or OpenAI Five, and you are better off understanding those systems using standard AI/ML thinking.
I further claim (with less confidence) that in a similar way, at the time that we build our first powerful AI systems, the results of Agent Foundations research at that time won’t have much to do with understanding those powerful AI systems.
Does that explain what it means? And if so, do you disagree with either of the claims?
Eh, this is true, but it’s also true that causal decision theory, game theory, and probability theory have a lot to do with how to understand how to build AlphaZero or OpenAI Five (and by extension, those systems themselves). I think the relevant question here must be whether you think the embedded agency program can succeed as much as the classical decision theory/probability theory program, and whether conditional on that success it can be as influential (probably with a shorter lag between the program succeeding and wanting to influence AI development).
Yeah, my second claim is intended to include that scenario as well. That is, if embedded agency succeeded and significantly influenced the development of the first powerful AI systems, I would consider my second claim to be false.
This scenario (of embedded agency influencing AI development) would surprise me conditional on short timelines. Conditional on long timelines, I’m not sure, and would want to think about it more.
Note also that in a world where you can’t build powerful AI without Agent Foundations, it’s not a big loss if you don’t work on Agent Foundations right now. The worry is in a world where you can build powerful AI without Agent Foundations, but it leads to catastrophe. I’m focusing on the worlds in which that is true and in which powerful AI is developed soon.
That is all sensible, I was just slightly annoyed by what I read as an implication that “AlphaGo doesn’t use UDT therefore advanced AI won’t” or something.
I agree with both your claims, but maybe with less confidence than you (I also agree with DanielFilan’s point below).
Here are two places I can imagine MIRI’s intuitions here coming from, and I’m interested in your thoughts on them:
(1) The “idealized reasoner is analogous to a Carnot engine” argument. It seems like you think advanced AI systems will be importantly disanalogous to this idea, and that’s not obvious to me.
(2) ‘We might care about expected utility maximization / theoretical rationality because there is an important sense in which you are less capable / dumber / irrational if e.g. you are susceptible to money pumps. So advanced agents, since they are advanced, will act closer to ideal agents.’
(I don’t have much time to comment so sorry if the above is confusing)
(1) I am unsure whether there exists an idealized reasoner analogous to a Carnot engine (see Realism about rationality). Even if such a reasoner exists, it seems unlikely that we will a) figure out what it is, b) understand it in sufficient depth, and c) successfully use it to understand and improve ML techniques, before we get powerful AI systems through other means. Under short timelines, this cuts particularly deeply, because a) there’s less time to do all of these things and b) it’s more likely that advanced AI is built out of “messy” deep learning systems that seem less amenable to this sort of theoretical understanding.
(2) I certainly agree that all else equal, advanced agents should act closer to ideal agents. (Assuming there is such a thing as an ideal agent.) I also agree that advanced AI should be less susceptible to money pumps, from which I learn that their “preferences” (i.e. world states that they work to achieve) are transitive. I’m also on board that more advanced AI systems are more likely to be described by some utility function that they are maximizing the expected utility of, per the VNM theorem. I don’t agree that the utility function must be simple, or that the AI must be internally reasoning by computing the expected utility over all actions and then choosing the one that’s highest. I would be extremely surprised if we built powerful AI such that when we say the English sentence “make paperclips” it acts in accordance with the utility function U(universe history) = number of paperclips in the last state of the universe history. I would be very surprised if we built powerful AI such that we hardcode in the above utility function and then design the AI to maximize its expected value.
The Value Learning sequence expands on position 2, especially in Chapter 2. The conclusion is a short version of it, but still longer than the parent comment.