I agree that if the outcome depends on our strategy or programming, then P(outcome | do(action)) has no proper place in our agent’s decision-making. Savages theorem requires us to use probabilities for the things that determine the outcome; if our action does not determine the outcome, its probability isn’t given by Savage’s theorem.
I think you’d agree that there are more-abstract objects like P(outcome | strategy) that do obey Savage’s theorem in these problems (I’d claim always, you’d maybe just say usually?). In the absent-minded driver problem, we can work the problem and get the right answer by using P(outcome | strategy). The strategy used determines the outcome, and so Savage’s theorem works on it.
And I do think that simultaneously, we can use Cox’s theorem to show that the absent-minded driver has some probability P(state | information). It’s just not integrated with decision-making in the usual way—we want to obey Savage’s theorem for that.
Also, Wei Dai made a comment similar to yours here.
Hmm. Following Sniffnoy’s post, It seems to me that a randomized strategy determines a lottery over outcomes, even when the state of the world is fixed. So Savage’s theorem will give you an arbitrary utility function over such lotteries, which cannot be easily converted to a utility function over outcomes… Have you worked through the application of Savage’s theorem to things like P(outcome|strategy) in detail? I don’t understand yet how it all works out.
a randomized strategy determines a lottery over outcomes, even when the state of the world is fixed
So, like, my options are to either to eat a cookie or not (utility 1 or 0). And if I want to randomize I can roll a die and only eat the cookie if I get an odd number. But then the expected utility of the strategy is pretty straightforward − 1⁄2 of a cookie.
Have you worked through the application of Savage’s theorem to things like P(outcome|strategy) in detail? I don’t understand yet how it all works out.
Not exhaustively, but the basic idea is that you’re changing the objects in “event-space.” If you want to look into it, I found this paper useful.
So, like, my options are to either to eat a cookie or not (utility 1 or 0)
If you already have probabilities and utilities, why do you need Savage’s theorem? I thought it was used to prove that a reasonable decision-making agent must have utilities over outcomes in the first place. My point was that applying it to things like P(outcome|strategy) might lead to problems on that path.
Ah, I see—is the idea “if we haven’t derived probabilities yet, how can we use probabilistic strategies?”
If we use some non-black-box random process, like rolling a die, then I think the problem resolves itself, since we don’t have to use probabilities to specify a die, we can just have a symmetry in our information about the sides of the die, or some knowledge of past rolls, etc. Under this picture, the “mixed” in mixed strategy would be externalized to the random process, and it would have the same format as a pure strategy.
Hmm, no, I was trying to make a different point. Okay, let’s back up a little. Can you spell out what you think are the assumptions and conclusions of Savage’s theorem with your proposed changes? I have some vague idea of what you might say, and I suspect that the conclusions don’t follow from the assumptions because the proof stops working, but by now we seem to misunderstand each other so much that I have to be sure.
I am proposing no changes. My claim is that even though we use english words like “event-space” or “actions” when describing Savage’s theorem, the things that actually have the relevant properties in the AMD problem are the strategies.
Cribbing from the paper I linked, the key property of “actions” is that they are functions from the set of “states of the world” (also somewhat mutable) to the set of consequences (the things I have a utility function over). If the state is “I’m at the first intersection” and I take the action (no quotes, actual action) of “go straight,” that does return a consequence.
How do you represent the strategy “always turn right” as a function from states to consequences? What does it return if the state is “I’m at the second intersection”, which is impossible if the agent uses that strategy?
Well, if we’re changing what objects are the “actions” in the proof, we’re probably also changing which objects are the “states.” You only need a strategy once, you don’t need a new strategy for each intersection.
If we have a strategy like “go straight with probability p,” a sufficient “state” is just the starting position and a description of the game.
Hmm, I’m not sure on what grounds we can actually rule out using the individual intersections as states, though, even though that leads to the wrong answer. Maybe they violate axiom 3, which requires the existence of “constant actions.”
Sorry for deleting my comment. I’m still trying to figure out where this approach leads. So now you’re saying that “I’m at the first intersection” isn’t actually a “state” and shouldn’t get a probability?
P(outcome | do(action)) has no proper place in our agent’s decision-making. Savages theorem requires us to use probabilities for the things that determine the outcome; if our action does not determine the outcome, its probability isn’t given by Savage’s theorem.
And I do think that simultaneously, we can use Cox’s theorem to show that the absent-minded driver has some probability P(state | information). It’s just not integrated with decision-making in the usual way—we want to obey Savage’s theorem for that.
So we’ll have a probability due to Cox’s theorem. But for decision-making, we won’t ever actually need that probability, because it’s not a probability of one of the objects Savage’s theorem cares about.
Yes, the key property of actions is that they are functions from the set of states to the set of consequences. Strategies do not have that property, because they can be randomized. If you convert randomized strategies to deterministic ones by “externalizing” random processes into black boxes in the world, Savage’s theorem will only give you some probability distribution over the black boxes, not necessarily the probability distribution that you intended. If you “hardcode” the probabilities of the black boxes into the inputs of Savage’s theorem, you might as well hardcode other things like utilities, and I don’t see the point.
I agree that if the outcome depends on our strategy or programming, then P(outcome | do(action)) has no proper place in our agent’s decision-making. Savages theorem requires us to use probabilities for the things that determine the outcome; if our action does not determine the outcome, its probability isn’t given by Savage’s theorem.
I think you’d agree that there are more-abstract objects like P(outcome | strategy) that do obey Savage’s theorem in these problems (I’d claim always, you’d maybe just say usually?). In the absent-minded driver problem, we can work the problem and get the right answer by using P(outcome | strategy). The strategy used determines the outcome, and so Savage’s theorem works on it.
And I do think that simultaneously, we can use Cox’s theorem to show that the absent-minded driver has some probability P(state | information). It’s just not integrated with decision-making in the usual way—we want to obey Savage’s theorem for that.
Also, Wei Dai made a comment similar to yours here.
Hmm. Following Sniffnoy’s post, It seems to me that a randomized strategy determines a lottery over outcomes, even when the state of the world is fixed. So Savage’s theorem will give you an arbitrary utility function over such lotteries, which cannot be easily converted to a utility function over outcomes… Have you worked through the application of Savage’s theorem to things like P(outcome|strategy) in detail? I don’t understand yet how it all works out.
So, like, my options are to either to eat a cookie or not (utility 1 or 0). And if I want to randomize I can roll a die and only eat the cookie if I get an odd number. But then the expected utility of the strategy is pretty straightforward − 1⁄2 of a cookie.
Not exhaustively, but the basic idea is that you’re changing the objects in “event-space.” If you want to look into it, I found this paper useful.
If you already have probabilities and utilities, why do you need Savage’s theorem? I thought it was used to prove that a reasonable decision-making agent must have utilities over outcomes in the first place. My point was that applying it to things like P(outcome|strategy) might lead to problems on that path.
Ah, I see—is the idea “if we haven’t derived probabilities yet, how can we use probabilistic strategies?”
If we use some non-black-box random process, like rolling a die, then I think the problem resolves itself, since we don’t have to use probabilities to specify a die, we can just have a symmetry in our information about the sides of the die, or some knowledge of past rolls, etc. Under this picture, the “mixed” in mixed strategy would be externalized to the random process, and it would have the same format as a pure strategy.
Hmm, no, I was trying to make a different point. Okay, let’s back up a little. Can you spell out what you think are the assumptions and conclusions of Savage’s theorem with your proposed changes? I have some vague idea of what you might say, and I suspect that the conclusions don’t follow from the assumptions because the proof stops working, but by now we seem to misunderstand each other so much that I have to be sure.
I am proposing no changes. My claim is that even though we use english words like “event-space” or “actions” when describing Savage’s theorem, the things that actually have the relevant properties in the AMD problem are the strategies.
Cribbing from the paper I linked, the key property of “actions” is that they are functions from the set of “states of the world” (also somewhat mutable) to the set of consequences (the things I have a utility function over). If the state is “I’m at the first intersection” and I take the action (no quotes, actual action) of “go straight,” that does return a consequence.
How do you represent the strategy “always turn right” as a function from states to consequences? What does it return if the state is “I’m at the second intersection”, which is impossible if the agent uses that strategy?
Well, if we’re changing what objects are the “actions” in the proof, we’re probably also changing which objects are the “states.” You only need a strategy once, you don’t need a new strategy for each intersection.
If we have a strategy like “go straight with probability p,” a sufficient “state” is just the starting position and a description of the game.
Hmm, I’m not sure on what grounds we can actually rule out using the individual intersections as states, though, even though that leads to the wrong answer. Maybe they violate axiom 3, which requires the existence of “constant actions.”
Sorry for deleting my comment. I’m still trying to figure out where this approach leads. So now you’re saying that “I’m at the first intersection” isn’t actually a “state” and shouldn’t get a probability?
Right. To quote myself:
So we’ll have a probability due to Cox’s theorem. But for decision-making, we won’t ever actually need that probability, because it’s not a probability of one of the objects Savage’s theorem cares about.
Yes, the key property of actions is that they are functions from the set of states to the set of consequences. Strategies do not have that property, because they can be randomized. If you convert randomized strategies to deterministic ones by “externalizing” random processes into black boxes in the world, Savage’s theorem will only give you some probability distribution over the black boxes, not necessarily the probability distribution that you intended. If you “hardcode” the probabilities of the black boxes into the inputs of Savage’s theorem, you might as well hardcode other things like utilities, and I don’t see the point.