Relevance of Subgame perfection. Seldin suggested subgame perfection as a refinement of Nash equilibrium which requires that decisions that seemed rational at the planning stage ought to still seem rational at the action stage. This at least suggests that we might want to consider requiring “subgame perfection” even if we only have a single player making two successive decisions.
Relevance of Footnote #4. This points out that one way to think of problems where a single player makes a series of decisions is to pretend that the problem has a series of players making the decisions—one decision per player, but that these fictitious players are linked in that they all share the same payoffs (but not necessarily the same information). This is a standard “trick” in game theory, but the footnote points out that in this case, since both fictitious players have the same information (because of the absent-mindedness) the game between driver-version-1 and driver-version-2 is symmetric, and that is equivalent to the constraint p1 = p2.
Does Footnote #4 really amount to “they had already argued for [just recalculating the planning-optimal solution]”? Well, no it doesn’t really. I blew it in offering that as evidence. (Still think it is cool, though!)
Do they “argue for it” anywhere else? Yes, they do. Section 5, where they apply their methods to a slightly more complicated example, is an extended argument for the superiority of the planning-optimal solution to the action-optimal solutions. As they explain, there can be multiple action-optimal solutions, even if there is only one (correct) planning-optimal solution, and some of those action-optimal solutions are wrong *even though they appear to promise a higher expected payoff than does the planning optimal solution.
I can’t see where they made this point. At the top of Section 4, they say “How, then, should the driver reason at the action stage?” and go on directly to describe action-optimality. If they said something like “One possibility is to just recompute and apply the planning-optimal solution. But if you insist …” please point out where. See also page 108:
In our case, there is only one player, who acts at different times. Because of his absent-mindedness, he had better coordinate his actions; this coordination can take place only before he starts out at the planning stage. At that point, he should choose p1 . If indeed he chose p1 , there is no problem. If by mistake he chose p2 or p3 , then that is what he should do at the action stage. (If he chose something else, or nothing at all, then at the action stage he will have some hard thinking to do.)
If Aumann et al. endorse using planning-optimality at the action stage, why would they say the driver has some hard thinking to do? Again, why not just recompute and apply the planning-optimal solution?
I really don’t see why you are having so much trouble parsing this. “If indeed he chose p1 , there is no problem” is an endorsement of the correctness of the planning-optimal solution. The sentence dealing with p2 and p3 asserts that, if you mistakenly used p2 for your first decision, then you best follow-up is to remain consistent and use p2 for your remaining two choices. The paragraph you quote to make your case is one I might well choose myself to make my case.
Edit: There are some asterisks in variable names in the original paper which I was unable to make work with the italics rules on this site. So “p2” above should be read as “p 2″
It is a statement that the planning-optimal action is the correct one, but it’s not an endorsement that it is correct to use the planning-optimality algorithm to compute what to do when you are already at an intersection. Do you see the difference?
ETA (edited to add): According to my reading of that paragraph, what they actually endorse is to compute the planning-optimal action at START, remember that, then at each intersection, compute the set of action-optimal actions, and pick the element of the set that coincides with the planning-optimal action.
BTW, you can use “\” to escape special characters like “*” and “_”.
Thx for the escape character info. That really ought to be added to the editing help popup.
Yes, I see the difference. I claim that what they are saying here is that you need to do the planning-optimal calculation in order to find p*1 as the unique best solution (among the three solutions that the action-optimal method provides). Once you have this, you can use it at the first intersection. But at the other intersections, you have some choices: either recalculate the planning-optimal solution each time, or write down enough information so that you can recognize that p*1 is the solution you are already committed to among the three (in section 5) solutions returned by the action-optimality calculation.
ETA in response to your ETA. Yes they do. Good point. I’m pretty sure there are cases more complicated than this perfectly amnesiac driver where that would be the only correct policy. (ETA:To be more specific, cases where the planning-optimal solution is not a sequential equilibrium). But then I have no reason to think that UDT would yield the correct answer in those more complicated cases either.
I deleted my previous reply since it seems unnecessary given your ETA.
I’m pretty sure there are cases more complicated than this perfectly amnesiac driver where that would be the only correct policy. (ETA:To be more specific, cases where the planning-optimal solution is not a sequential equilibrium).
What would be the only correct policy? What I wrote after “According to my reading of that paragraph”? If so, I don’t understand your “cases where the planning-optimal solution is not a sequential equilibrium”. Please explain.
What would be the only correct policy? What I wrote after “According to my reading of that paragraph”?
Yes.
If so, I don’t understand your “cases where the planning-optimal solution is not a sequential equilibrium”. Please explain.
I would have thought it would be self explanatory.
It looks like I will need to construct and analyze examples slightly more complicated that the Absent Minded Driver. That may take a while. Questions before I start: Does UDT encompass game theory, or is it limited to analyzing single-player situations? Is UDT completely explained in your postings, or is it, like TDT, still in the process of being written up?
Questions before I start: Does UDT encompass game theory, or is it limited to analyzing single-player situations? Is UDT completely explained in your postings, or is it, like TDT, still in the process of being written up?
Wei has described a couple versions of UDT. His descriptions seemed to me to be mathematically rigorous. Based on Wei’s posts, I wrote this pdf, which gives just the definition of a UDT agent (as I understand it), without motivation or justification.
The difficulty with multiple agents looks like it will be very hard to get around within the UDT framework. UDT works essentially by passing the buck to an agent who is at the planning stage*. That planning-stage agent then performs a conventional expected-utility calculation.
But some scenarios seem best described by saying that there are multiple planning-stage agents. That means that UDT is subject to all of the usual difficulties that arise when you try to use expected utility alone in multiplayer games (e.g., prisoners dilemma). It’s just that these difficulties arise at the planning stage instead of at the action stage directly.
*Somewhat more accurately, the buck is passed to the UDT agent’s simulation of an agent who is at the planning stage.
What I meant was, what point were you trying to make with that statement? According to Aumann’s paper, every planning-optimal solution is also an action-optimal solution, so the decision procedure they endorse will end up picking the planning-optimal solution. (My complaint is just that it goes about it in an unnecessarily round-about way.) If theirs is a correct policy, then the policy of just recomputing the planning-optimal solution must also be correct. That seems to disprove your “only correct policy” claim. I thought your “sequential equilibrium” line was trying to preempt this argument, but I can’t see how.
Does UDT encompass game theory, or is it limited to analyzing single-player situations?
Pretty much single-player for now. A number of people are trying to extend the ideas to multi-player situations, but it looks really hard.
Is UDT completely explained in your postings, or is it, like TDT, still in the process of being written up?
No, it’s not being written up further. (Nesov is writing up some of his ideas, which are meant to be an advance over UDT.)
What I meant was, what point were you trying to make with that statement? According to Aumann’s paper, every planning-optimal solution is also an action-optimal solution, so the decision procedure they endorse will end up picking the planning-optimal solution.
My understanding of their paper has changed somewhat since we began this discussion. I now believe that repeating the planning-optimal analysis at every decision node is only guaranteed to give ideal results in simple cases like this one in which every decision point is in the same information set. In more complicated cases,
I can imagine that the policy of planning-optimal-for-the first-move, then action-optimal-thereafter might do better. I would need to construct an example to assert this with confidence.
(My complaint is just that it goes about it in an unnecessarily round-about way.) If theirs is a correct policy, then the policy of just recomputing the planning-optimal solution must also be correct.
In this simple example, yes. Perhaps not in more complicated cases.
That seems to disprove your “only correct policy” claim. I thought your “sequential equilibrium” line was trying to preempt this argument, but I can’t see how.
And I can’t see how to explain it without an example
While I wait, did you see anything in Aumann’s paper that hints at “the policy of planning-optimal-for-the first-move, then action-optimal-thereafter might do better”? Or is that your original research (to use Wikipedia-speak)? It occurs to me that if you’re correct about that, the authors of the paper should have realized it themselves and mentioned it somewhere, since it greatly strengthens their position.
Answering that is a bit tricky. If I am wrong, it is certainly “original research”. But my belief is based upon readings in game theory (including stuff by Aumann) which are not explicitly contained in that paper.
Please bear with me. I have a multi-player example in mind, but I hope to be able to find a single-player one which makes the reasoning clearer.
Regarding your last sentence, I must point out that the whole reason we are having this discussion is my claim to the effect that you don’t really understand their position, and hence cannot judge what does or does not strengthen it.
Ok, I now have at least a sketch of an example. I haven’t worked it out in detail, so I may be wrong, but here is what I think. In any scenario in which you gain and act on information after the planning stage, you should not use a recalculated planning-stage solution for any decisions after you have acted upon that information. Instead, you need to do the action-optimal analysis.
For example, let us complicate the absent-minded driver scenario that you diagrammed by adding an information-receipt and decision node prior to those two identical intersections. The driver comes in from the west and arrives at a T intersection where he can turn left(north) or right(south). At the intersection is a billboard advertising today’s lunch menu at Casa de Maria, his favorite restaurant. If the billboard promotes chile, he will want to turn right so as to have a good chance of reaching Maria’s for lunch. But if the billboard promotes enchiladas, which he dislikes, he probably wants to turn the other way and try for Marcello’s Pizza. Whether he turns right or left at the billboard, he will face two consecutive identical intersections (four identical intersections total). The day is cloudy, so he cannot tell whether he is traveling north or south.
Working this example in detail will take some work. Let me know if you think the work is necessary.
Once you have this, you can use it at the first intersection. But at the other intersections, you have some choices
It is a part of the problem statement that you can’t distinguish between being at any of the intersections. So you have to use the same algorithm at all of them.
either recalculate the planning-optimal solution each time
How are you getting this from their words? What about “this coordination can take place only before he starts out at the planning stage”? And “If he chose something else, or nothing at all, then at the action stage he will have some hard thinking to do”? Why would they say “hard thinking” if they meant “recalculate the planning-optimal solution”? (Especially when the planning-optimality calculation is simpler than the action-optimality calculation.)
Relevance of Subgame perfection. Seldin suggested subgame perfection as a refinement of Nash equilibrium which requires that decisions that seemed rational at the planning stage ought to still seem rational at the action stage. This at least suggests that we might want to consider requiring “subgame perfection” even if we only have a single player making two successive decisions.
Relevance of Footnote #4. This points out that one way to think of problems where a single player makes a series of decisions is to pretend that the problem has a series of players making the decisions—one decision per player, but that these fictitious players are linked in that they all share the same payoffs (but not necessarily the same information). This is a standard “trick” in game theory, but the footnote points out that in this case, since both fictitious players have the same information (because of the absent-mindedness) the game between driver-version-1 and driver-version-2 is symmetric, and that is equivalent to the constraint p1 = p2.
Does Footnote #4 really amount to “they had already argued for [just recalculating the planning-optimal solution]”? Well, no it doesn’t really. I blew it in offering that as evidence. (Still think it is cool, though!)
Do they “argue for it” anywhere else? Yes, they do. Section 5, where they apply their methods to a slightly more complicated example, is an extended argument for the superiority of the planning-optimal solution to the action-optimal solutions. As they explain, there can be multiple action-optimal solutions, even if there is only one (correct) planning-optimal solution, and some of those action-optimal solutions are wrong *even though they appear to promise a higher expected payoff than does the planning optimal solution.
I really don’t see why you are having so much trouble parsing this. “If indeed he chose p1 , there is no problem” is an endorsement of the correctness of the planning-optimal solution. The sentence dealing with p2 and p3 asserts that, if you mistakenly used p2 for your first decision, then you best follow-up is to remain consistent and use p2 for your remaining two choices. The paragraph you quote to make your case is one I might well choose myself to make my case.
Edit: There are some asterisks in variable names in the original paper which I was unable to make work with the italics rules on this site. So “p2” above should be read as “p 2″
It is a statement that the planning-optimal action is the correct one, but it’s not an endorsement that it is correct to use the planning-optimality algorithm to compute what to do when you are already at an intersection. Do you see the difference?
ETA (edited to add): According to my reading of that paragraph, what they actually endorse is to compute the planning-optimal action at START, remember that, then at each intersection, compute the set of action-optimal actions, and pick the element of the set that coincides with the planning-optimal action.
BTW, you can use “\” to escape special characters like “*” and “_”.
Thx for the escape character info. That really ought to be added to the editing help popup.
Yes, I see the difference. I claim that what they are saying here is that you need to do the planning-optimal calculation in order to find p*1 as the unique best solution (among the three solutions that the action-optimal method provides). Once you have this, you can use it at the first intersection. But at the other intersections, you have some choices: either recalculate the planning-optimal solution each time, or write down enough information so that you can recognize that p*1 is the solution you are already committed to among the three (in section 5) solutions returned by the action-optimality calculation.
ETA in response to your ETA. Yes they do. Good point. I’m pretty sure there are cases more complicated than this perfectly amnesiac driver where that would be the only correct policy. (ETA:To be more specific, cases where the planning-optimal solution is not a sequential equilibrium). But then I have no reason to think that UDT would yield the correct answer in those more complicated cases either.
I deleted my previous reply since it seems unnecessary given your ETA.
What would be the only correct policy? What I wrote after “According to my reading of that paragraph”? If so, I don’t understand your “cases where the planning-optimal solution is not a sequential equilibrium”. Please explain.
Yes.
I would have thought it would be self explanatory.
It looks like I will need to construct and analyze examples slightly more complicated that the Absent Minded Driver. That may take a while. Questions before I start: Does UDT encompass game theory, or is it limited to analyzing single-player situations? Is UDT completely explained in your postings, or is it, like TDT, still in the process of being written up?
Wei has described a couple versions of UDT. His descriptions seemed to me to be mathematically rigorous. Based on Wei’s posts, I wrote this pdf, which gives just the definition of a UDT agent (as I understand it), without motivation or justification.
The difficulty with multiple agents looks like it will be very hard to get around within the UDT framework. UDT works essentially by passing the buck to an agent who is at the planning stage*. That planning-stage agent then performs a conventional expected-utility calculation.
But some scenarios seem best described by saying that there are multiple planning-stage agents. That means that UDT is subject to all of the usual difficulties that arise when you try to use expected utility alone in multiplayer games (e.g., prisoners dilemma). It’s just that these difficulties arise at the planning stage instead of at the action stage directly.
*Somewhat more accurately, the buck is passed to the UDT agent’s simulation of an agent who is at the planning stage.
What I meant was, what point were you trying to make with that statement? According to Aumann’s paper, every planning-optimal solution is also an action-optimal solution, so the decision procedure they endorse will end up picking the planning-optimal solution. (My complaint is just that it goes about it in an unnecessarily round-about way.) If theirs is a correct policy, then the policy of just recomputing the planning-optimal solution must also be correct. That seems to disprove your “only correct policy” claim. I thought your “sequential equilibrium” line was trying to preempt this argument, but I can’t see how.
Pretty much single-player for now. A number of people are trying to extend the ideas to multi-player situations, but it looks really hard.
No, it’s not being written up further. (Nesov is writing up some of his ideas, which are meant to be an advance over UDT.)
My understanding of their paper has changed somewhat since we began this discussion. I now believe that repeating the planning-optimal analysis at every decision node is only guaranteed to give ideal results in simple cases like this one in which every decision point is in the same information set. In more complicated cases, I can imagine that the policy of planning-optimal-for-the first-move, then action-optimal-thereafter might do better. I would need to construct an example to assert this with confidence.
In this simple example, yes. Perhaps not in more complicated cases.
And I can’t see how to explain it without an example
While I wait, did you see anything in Aumann’s paper that hints at “the policy of planning-optimal-for-the first-move, then action-optimal-thereafter might do better”? Or is that your original research (to use Wikipedia-speak)? It occurs to me that if you’re correct about that, the authors of the paper should have realized it themselves and mentioned it somewhere, since it greatly strengthens their position.
Answering that is a bit tricky. If I am wrong, it is certainly “original research”. But my belief is based upon readings in game theory (including stuff by Aumann) which are not explicitly contained in that paper.
Please bear with me. I have a multi-player example in mind, but I hope to be able to find a single-player one which makes the reasoning clearer.
Regarding your last sentence, I must point out that the whole reason we are having this discussion is my claim to the effect that you don’t really understand their position, and hence cannot judge what does or does not strengthen it.
Ok, I now have at least a sketch of an example. I haven’t worked it out in detail, so I may be wrong, but here is what I think. In any scenario in which you gain and act on information after the planning stage, you should not use a recalculated planning-stage solution for any decisions after you have acted upon that information. Instead, you need to do the action-optimal analysis.
For example, let us complicate the absent-minded driver scenario that you diagrammed by adding an information-receipt and decision node prior to those two identical intersections. The driver comes in from the west and arrives at a T intersection where he can turn left(north) or right(south). At the intersection is a billboard advertising today’s lunch menu at Casa de Maria, his favorite restaurant. If the billboard promotes chile, he will want to turn right so as to have a good chance of reaching Maria’s for lunch. But if the billboard promotes enchiladas, which he dislikes, he probably wants to turn the other way and try for Marcello’s Pizza. Whether he turns right or left at the billboard, he will face two consecutive identical intersections (four identical intersections total). The day is cloudy, so he cannot tell whether he is traveling north or south.
Working this example in detail will take some work. Let me know if you think the work is necessary.
Ok, I see. I’ll await your example.
It is a part of the problem statement that you can’t distinguish between being at any of the intersections. So you have to use the same algorithm at all of them.
How are you getting this from their words? What about “this coordination can take place only before he starts out at the planning stage”? And “If he chose something else, or nothing at all, then at the action stage he will have some hard thinking to do”? Why would they say “hard thinking” if they meant “recalculate the planning-optimal solution”? (Especially when the planning-optimality calculation is simpler than the action-optimality calculation.)
You can use a backslash to escape special characters in markdown.
If you type \*, that will show up as * in the posted text.