If I’m reading it correctly, the basis for 50⁄7 “action” expected value is that the driver might have previously switched strategies from the optimal one (p=0) to a poorer local maximum (p=1/2). Subsequently they may have driven through some unknown number of turns.
If this already happened, then continuing with p=1/2 is correct and the expected value truly is 50⁄7. This is greater than the expected value at the start because there are two lower-value states eliminated by “currently at an intersection” conditioning.
[Edit: Oops, forgot the main point I was going to make]
The problem is that if the driver carries out this reasoning at any intersection, by the amnesic hypothesis they should apply it at every intersection. In particular they will apply it at the first intersection, where it is a mistake and they know that in that situation it is a mistake.
Assuming with no evidence that you have already made a mistake seems like a poor starting assumption for any rational decision theory.
the basis for 50⁄7 “action” expected value is that the driver might have previously switched strategies from the optimal one (p=0) to a poorer local maximum (p=1/2).
I don’t think that is the basis. p=1/2 as one of the action optimal is derived by finding a stable point of the action output function. The expected payoff is obtained by subbing p=1/2 into the action payoff. In this process, the planning optimal of p=0 was not part of the derivation. So it is not a “switch” of strategy per se. The fact that I may have already driven through some intersections is an inherent part of the problem (absentmindedness), any mixed strategy (CONTINUE with a random chance) would have to face that. Not special to action optimal like p=1/2.
Furthermore, if we are considering the action payoff function (i.e. the one using probabilities of “here is X/Y/Z”) then p=1/2 is not a inferior local maximum. At the very least it is a better point than the planning optimal p=0. Also as long as he uses the action payoff function, the driver should indeed apply the same analysis at every intersection and arrives at p=1/2 independently. i.e. it is consistent with observation point 2: “The driver is aware he will make (has made) an identical decision at the other intersection too. ”
I agree using p=1/2 is a mistake. As you have pointed out, it is especially obvious at the first intersection. My position is this mistake is due to the action payoff function being fallacious. Because it uses self-locating probability. As oppose to Aumman’s explanation: that the driver could not coordinate on p=1/2, due to the absentmindedness they can only coordinate at the planning stage.
p=1/2 as one of the action optimal is derived by finding a stable point of the action output function. The expected payoff is obtained by subbing p=1/2 into the action payoff.
Yes, assuming that the “action payoff” were useful at all. The scenario is that the driver determined at the start, using their knowledge of where they are, that p=0 was optimal. This is a correct decision.
The “action optimal” reasoning assumes that the driver applies it at every intersection, which means that it’s only worth considering under the assumption that the driver changed their mind from p=0 to p=1/2 some time before the first intersection. Even if the “action payoff” was a useful thing to maximize (it isn’t), this would still be a very dubious assumption.
I agree using p=1/2 is a mistake. As you have pointed out, it is especially obvious at the first intersection. My position is this mistake is due to the action payoff function being fallacious. Because it uses self-locating probability.
Maximizing that quantity is a mistake whether or not self-locating probabilities are used. You can define the equivalent quantity for non-self-locating models and it isn’t useful there either, for the same reasons.
> The “action optimal” reasoning assumes that the driver applies it at every intersection
This is a pretty obvious assumption that has been reflected by Aumman’s observation 2. ” The driver is aware he will make (has made) an identical decision at the other intersection too. ” I do not see any reason to challenge that. But if I understand correctly, you do.
> which means that it’s only worth considering under the assumption that the driver changed their mind from p=0 to p=1/2 sometime before the first intersection
I disagree that can be referred to as a change of mind. The derivation of action optimal is an independent process from the derivation of planning optimal. Maybe you mean the driver could only coordinate on the planning optimal due to the absentmindedness similar to Aumman’s reasoning. But then again, you don’t seem to agree with his observation point 2. So I am not entirely sure about your position.
If you say there is no reason for the driver to change his decision from the planning stage since there is no new information. Then we are making the same point. However, for the driver, the “no new information” argument applies not only to the first intersection but to any/all intersections. So again I am not sure why stressing on the first intersection. And then there is the problem of no new information, i.e. not changing decision, vs there are multiple action optimal points with higher payoffs, i.e. why not choosing them. Which I think lacks a compelling explanation.
> Maximizing that quantity is a mistake whether or not self-locating probabilities are used.
The p=1/2 is not found by maximizing the action utility function. It is derived by finding stable points/nash equilibriums. p=1/2 is one of them, the same as p=0. Among these stable points, p=1/2 has the highest expected utility. In comparison, the planning utility function does not use self-locating probability. And maximizing it gives the planning optimal, which is uncontroversially useful.
I disagree that can be referred to as a change of mind.
Before starting the drive, the driver determines that always turning at the first intersection will be optimal. I didn’t think we disagreed on that.
The p=1/2 is not found by maximizing the action utility function.
Yes, it is. You can verify this by finding the explicit expression for action utility as a function of p (a rational function consisting of a fourth-order polynomial divided by a quadratic), and verifying that it has a maximum at p=1/2. The payoffs were clearly carefully chosen to ensure this.
> Before starting the drive, the driver determines that always turning at the first intersection will be optimal. I didn’t think we disagreed on that.
But the driver does not have to do any calculation before starting the drive. He can do that, yes. He also can simply choose to only think about the decision when arrived at an intersection. It is possible for him to derive the “action optimals” chronologically before deriving the “planning optimal”. As I said earlier, they are two independent processes.
>Yes, it is. You can verify this by finding the explicit expression for action utility as a function of p....
No, it was not found by maximizing the action utility function. In Aumann’s process, the action utility function was not represented by a single variable p, but with multiple variables representing casually disconnected decisions (observation 1). Because the decisions ought to be the same (observations 2) the action optimal ought to be symmetrical Nash equilibriums or “stable points”. You can see an example in Eliezer Yudkowsky’s post. For this particular problem, there are three stable points for the action utility functions. p=0, p=7/30 and p=1/2. Among these three p=1/2 gives the highest action payoff, 7⁄30 the lowest.
I will take your words for it that p=1/2 also maximizes action utility. But that is just a coincidence for this particular problem. Not how action optimals are found per Aumann.
For the sake of clarity let’s take a step back and examine our positions. Everyone agrees p=1/2 is not the right choice. Aumann thinks it is done through 2 steps.
1. Derive all action optimals using by finding the stable point of the action utility function. ( p=1/2 is one of them, as well as p=0)
2. p=1/2 is rejected because it is not possible for the driver at different intersections to coordinate on it due to absentmindedness.
I disagree with both points 1 and 2, reason being the action utility function is fallacious. Are you rejecting both, or point 2 only, or are you agreeing with him?
For the sake of clarity let’s take a step back and examine our positions. Everyone agrees p=1/2 is not the right choice. Aumann thinks it is done through 2 steps.
1. Derive all action optimals using by finding the stable point of the action utility function. ( p=1/2 is one of them, as well as p=0)
2. p=1/2 is rejected because it is not possible for the driver at different intersections to coordinate on it due to absentmindedness.
I disagree with both points 1 and 2, reason being the action utility function is fallacious. Are you rejecting both, or point 2 only, or are you agreeing with him?
Point 1 is wrong. Action utility measures the wrong thing in this scenario, but does measure the correct thing for some superficially similar but actually different scenarios.
Point 2 is also wrong, because it’s perfectly possible to be able to coordinate in this scenario. It’s just that due to point 1 being wrong, they would be coordinating on the wrong strategy.
So we agree that both these points are incorrect, but we disagree on the reasons for them being incorrect.
OK, I think that is clearer now. I assume you think the strategy to coordinate on should be determined by maximizing the planning utility function. Not by maximizing the action utility function nor finding the stable point of the action utility function. I agree with all of this.
The difference is that you think the self-locating probabilities are valid. The action utility function that uses them is valid but can only be used in superficially similar problems such as multiple drivers being randomly assigned to intersections.
While I think self-locating probabilities are not valid, therefore the action utility functions are fallacious. Whereas in problems where multiple drivers are randomly assigned to intersections, the probability for someone assigned to an intersection is not self-locating probabilities.
Pretty close. I do think that self-locating probabilities can be valid, but determining the most relevant one to a given situation can be difficult. There are a lot more subtle opportunities for error than with more familiar externally supplied probabilities.
In particular, the way in which this choice of self-locating probability is used in this scenario does not suit the payoff schedule and incentives. Transforming it into related scenarios with non-self-locating probabilities is just one way to show that the problem exists.
If I’m reading it correctly, the basis for 50⁄7 “action” expected value is that the driver might have previously switched strategies from the optimal one (p=0) to a poorer local maximum (p=1/2). Subsequently they may have driven through some unknown number of turns.
If this already happened, then continuing with p=1/2 is correct and the expected value truly is 50⁄7. This is greater than the expected value at the start because there are two lower-value states eliminated by “currently at an intersection” conditioning.
[Edit: Oops, forgot the main point I was going to make]
The problem is that if the driver carries out this reasoning at any intersection, by the amnesic hypothesis they should apply it at every intersection. In particular they will apply it at the first intersection, where it is a mistake and they know that in that situation it is a mistake.
Assuming with no evidence that you have already made a mistake seems like a poor starting assumption for any rational decision theory.
I don’t think that is the basis. p=1/2 as one of the action optimal is derived by finding a stable point of the action output function. The expected payoff is obtained by subbing p=1/2 into the action payoff. In this process, the planning optimal of p=0 was not part of the derivation. So it is not a “switch” of strategy per se. The fact that I may have already driven through some intersections is an inherent part of the problem (absentmindedness), any mixed strategy (CONTINUE with a random chance) would have to face that. Not special to action optimal like p=1/2.
Furthermore, if we are considering the action payoff function (i.e. the one using probabilities of “here is X/Y/Z”) then p=1/2 is not a inferior local maximum. At the very least it is a better point than the planning optimal p=0. Also as long as he uses the action payoff function, the driver should indeed apply the same analysis at every intersection and arrives at p=1/2 independently. i.e. it is consistent with observation point 2: “The driver is aware he will make (has made) an identical decision at the other intersection too. ”
I agree using p=1/2 is a mistake. As you have pointed out, it is especially obvious at the first intersection. My position is this mistake is due to the action payoff function being fallacious. Because it uses self-locating probability. As oppose to Aumman’s explanation: that the driver could not coordinate on p=1/2, due to the absentmindedness they can only coordinate at the planning stage.
Yes, assuming that the “action payoff” were useful at all. The scenario is that the driver determined at the start, using their knowledge of where they are, that p=0 was optimal. This is a correct decision.
The “action optimal” reasoning assumes that the driver applies it at every intersection, which means that it’s only worth considering under the assumption that the driver changed their mind from p=0 to p=1/2 some time before the first intersection. Even if the “action payoff” was a useful thing to maximize (it isn’t), this would still be a very dubious assumption.
Maximizing that quantity is a mistake whether or not self-locating probabilities are used. You can define the equivalent quantity for non-self-locating models and it isn’t useful there either, for the same reasons.
> The “action optimal” reasoning assumes that the driver applies it at every intersection
This is a pretty obvious assumption that has been reflected by Aumman’s observation 2. ” The driver is aware he will make (has made) an identical decision at the other intersection too. ” I do not see any reason to challenge that. But if I understand correctly, you do.
> which means that it’s only worth considering under the assumption that the driver changed their mind from p=0 to p=1/2 sometime before the first intersection
I disagree that can be referred to as a change of mind. The derivation of action optimal is an independent process from the derivation of planning optimal. Maybe you mean the driver could only coordinate on the planning optimal due to the absentmindedness similar to Aumman’s reasoning. But then again, you don’t seem to agree with his observation point 2. So I am not entirely sure about your position.
If you say there is no reason for the driver to change his decision from the planning stage since there is no new information. Then we are making the same point. However, for the driver, the “no new information” argument applies not only to the first intersection but to any/all intersections. So again I am not sure why stressing on the first intersection. And then there is the problem of no new information, i.e. not changing decision, vs there are multiple action optimal points with higher payoffs, i.e. why not choosing them. Which I think lacks a compelling explanation.
> Maximizing that quantity is a mistake whether or not self-locating probabilities are used.
The p=1/2 is not found by maximizing the action utility function. It is derived by finding stable points/nash equilibriums. p=1/2 is one of them, the same as p=0. Among these stable points, p=1/2 has the highest expected utility. In comparison, the planning utility function does not use self-locating probability. And maximizing it gives the planning optimal, which is uncontroversially useful.
Before starting the drive, the driver determines that always turning at the first intersection will be optimal. I didn’t think we disagreed on that.
Yes, it is. You can verify this by finding the explicit expression for action utility as a function of p (a rational function consisting of a fourth-order polynomial divided by a quadratic), and verifying that it has a maximum at p=1/2. The payoffs were clearly carefully chosen to ensure this.
> Before starting the drive, the driver determines that always turning at the first intersection will be optimal. I didn’t think we disagreed on that.
But the driver does not have to do any calculation before starting the drive. He can do that, yes. He also can simply choose to only think about the decision when arrived at an intersection. It is possible for him to derive the “action optimals” chronologically before deriving the “planning optimal”. As I said earlier, they are two independent processes.
>Yes, it is. You can verify this by finding the explicit expression for action utility as a function of p....
No, it was not found by maximizing the action utility function. In Aumann’s process, the action utility function was not represented by a single variable p, but with multiple variables representing casually disconnected decisions (observation 1). Because the decisions ought to be the same (observations 2) the action optimal ought to be symmetrical Nash equilibriums or “stable points”. You can see an example in Eliezer Yudkowsky’s post. For this particular problem, there are three stable points for the action utility functions. p=0, p=7/30 and p=1/2. Among these three p=1/2 gives the highest action payoff, 7⁄30 the lowest.
I will take your words for it that p=1/2 also maximizes action utility. But that is just a coincidence for this particular problem. Not how action optimals are found per Aumann.
For the sake of clarity let’s take a step back and examine our positions. Everyone agrees p=1/2 is not the right choice. Aumann thinks it is done through 2 steps.
1. Derive all action optimals using by finding the stable point of the action utility function. ( p=1/2 is one of them, as well as p=0)
2. p=1/2 is rejected because it is not possible for the driver at different intersections to coordinate on it due to absentmindedness.
I disagree with both points 1 and 2, reason being the action utility function is fallacious. Are you rejecting both, or point 2 only, or are you agreeing with him?
Point 1 is wrong. Action utility measures the wrong thing in this scenario, but does measure the correct thing for some superficially similar but actually different scenarios.
Point 2 is also wrong, because it’s perfectly possible to be able to coordinate in this scenario. It’s just that due to point 1 being wrong, they would be coordinating on the wrong strategy.
So we agree that both these points are incorrect, but we disagree on the reasons for them being incorrect.
OK, I think that is clearer now. I assume you think the strategy to coordinate on should be determined by maximizing the planning utility function. Not by maximizing the action utility function nor finding the stable point of the action utility function. I agree with all of this.
The difference is that you think the self-locating probabilities are valid. The action utility function that uses them is valid but can only be used in superficially similar problems such as multiple drivers being randomly assigned to intersections.
While I think self-locating probabilities are not valid, therefore the action utility functions are fallacious. Whereas in problems where multiple drivers are randomly assigned to intersections, the probability for someone assigned to an intersection is not self-locating probabilities.
Pretty close. I do think that self-locating probabilities can be valid, but determining the most relevant one to a given situation can be difficult. There are a lot more subtle opportunities for error than with more familiar externally supplied probabilities.
In particular, the way in which this choice of self-locating probability is used in this scenario does not suit the payoff schedule and incentives. Transforming it into related scenarios with non-self-locating probabilities is just one way to show that the problem exists.