I agree that the most obvious formalization of Alice’s preferences would depend on thisAgent. So I’m saying that there actually is a nontrivial restriction on her preferences: If she wants to keep something like her informal formulation, she will need to decide what they are supposed to mean in terms that do not refer to thisAgent.
Got it. I think.
But how could you come up with a pair of situations such that in situation (i), the agent can choose options A and B, while in situation (ii), the agent can choose between A, B and C, and yet the agent has exactly the same information in situations (i) and (ii)?
In situation (i), Alice can choose between chocolate and vanilla ice cream. In situation (ii), Alice can choose between chocolate, vanilla, and strawberry ice cream. Having access to these options doesn’t change Alice’s knowledge about her preferences for ice cream flavors (under the assumption that access to flavors on a given day doesn’t reflect some kind of global shortage of a flavor). In general it might help to have Alice’s choices randomly determined, so that Alice’s knowledge of her choices doesn’t give her information about anything else.
Sorry, I do not understand—what do you mean by your composite choices? What does it mean to choose (A and B) when A and B are mutually exclusive options?
Sorry, I should probably have used “or” instead of “and.” If A and B are the primitive choices “chocolate ice cream” and “vanilla ice cream,” then the composite choice (A or B) is “the opportunity to choose between chocolate and vanilla ice cream.” The point is that once you allow a decision theory to assign preferences among composite choices, then composition of choices is associative, so preferences among an arbitrary number of primitive choices are determined by preferences among pairs of primitive choices.
Maybe it’s not right to say that it seems like a bad idea, more like it would seem at first that people just don’t have terminal preferences about the algorithm run (or at least not strong ones—you might derive enjoyment from an elegant algorithm, but that wouldn’t outweigh your desire to save lives, so your instrumental preference for a well-working algorithm would always dominate your terminal preference for enjoying an elegant algorithm, if the two came into conflict). So at first it might seem reasonable to design a decision theory where you are not allowed to care about the algorithm your AI is running—I find it at least conceivable that when trying to prove theorems about self-modifying AI, making such an assumption might simplify things, so this does seem like a conceivable failure mode to me.
Okay, but it still seems reasonable to have instrumental preferences about algorithms that AIs run, and I don’t see why decision theory is not allowed to talk about instrumental preferences. (Admittedly I don’t know very much about decision theory.)
Got it. I think.
In situation (i), Alice can choose between chocolate and vanilla ice cream. In situation (ii), Alice can choose between chocolate, vanilla, and strawberry ice cream. Having access to these options doesn’t change Alice’s knowledge about her preferences for ice cream flavors (under the assumption that access to flavors on a given day doesn’t reflect some kind of global shortage of a flavor). In general it might help to have Alice’s choices randomly determined, so that Alice’s knowledge of her choices doesn’t give her information about anything else.
Sorry, I should probably have used “or” instead of “and.” If A and B are the primitive choices “chocolate ice cream” and “vanilla ice cream,” then the composite choice (A or B) is “the opportunity to choose between chocolate and vanilla ice cream.” The point is that once you allow a decision theory to assign preferences among composite choices, then composition of choices is associative, so preferences among an arbitrary number of primitive choices are determined by preferences among pairs of primitive choices.
Okay, but it still seems reasonable to have instrumental preferences about algorithms that AIs run, and I don’t see why decision theory is not allowed to talk about instrumental preferences. (Admittedly I don’t know very much about decision theory.)