Does this “action payoff” actually measure anything meaningful?
Well yes, it does measure payoff averaged over decision-points instead of over trips. This would be relevant if instead of a single driver who wins the given amount upon reaching their destination, there were multiple identical selfish drivers drawn by lottery who make a decision at one visited intersection each (without knowing which one) and all receive winnings. Although the outcomes seem the same—you make a decision and get paid depending upon where the car ends up—the incentives are different.
Consider a road with 1000 intersections. Turning at the first pays $100, the second pays nothing, and the subsequent turns all pay $101. Going straight through all intersections pays nothing. Consider the two strategies of turning with 100% probability or 2% probability in each game.
In both game variants, the 100% turn always yields $100 for any driver.
In the original absent-minded driver game, turning 2% of the time is clearly a poor idea: there is a 2% chance of $100, 1.96% chance of nothing, and 96.04% chance of $101 for an average of $99.00.(There is a negligible chance of going straight through and getting nothing)
In the multiple driver game, any given driver can determine (based on the objective lottery odds) that there is a 2% chance that if selected, they will be placed at intersection 1, 1.96% at intersection 2, and otherwise they are somewhere between 3 and 700 (with negligible probability of greater). If at 1 their payout will average $99.00, if at 2 they average $98.98, otherwise they average $101.00. Their expected net winnings are then $100.92.
For the multiple driver game, the 2% turn strategy is clearly better than the 100% turn strategy and this can be calculated at the start. It does not depend upon self-locating probabilities, it is just considering a different type of scenario.
So in short, there is no paradox and nothing wrong with self-locating probabilities. An absent-minded driver who uses the “action optimal” strategy is simply misapplying it to the wrong type of scenario.
Does this “action payoff” actually measure anything meaningful?
Well yes, it does measure payoff averaged over decision-points instead of over trips. This would be relevant if instead of a single driver who wins the given amount upon reaching their destination, there were multiple identical selfish drivers drawn by lottery who make a decision at one visited intersection each (without knowing which one) and all receive winnings. Although the outcomes seem the same—you make a decision and get paid depending upon where the car ends up—the incentives are different.
Consider a road with 1000 intersections. Turning at the first pays $100, the second pays nothing, and the subsequent turns all pay $101. Going straight through all intersections pays nothing. Consider the two strategies of turning with 100% probability or 2% probability in each game.
In both game variants, the 100% turn always yields $100 for any driver.
In the original absent-minded driver game, turning 2% of the time is clearly a poor idea: there is a 2% chance of $100, 1.96% chance of nothing, and 96.04% chance of $101 for an average of $99.00.(There is a negligible chance of going straight through and getting nothing)
In the multiple driver game, any given driver can determine (based on the objective lottery odds) that there is a 2% chance that if selected, they will be placed at intersection 1, 1.96% at intersection 2, and otherwise they are somewhere between 3 and 700 (with negligible probability of greater). If at 1 their payout will average $99.00, if at 2 they average $98.98, otherwise they average $101.00. Their expected net winnings are then $100.92.
For the multiple driver game, the 2% turn strategy is clearly better than the 100% turn strategy and this can be calculated at the start. It does not depend upon self-locating probabilities, it is just considering a different type of scenario.
So in short, there is no paradox and nothing wrong with self-locating probabilities. An absent-minded driver who uses the “action optimal” strategy is simply misapplying it to the wrong type of scenario.