Indeed, though it doesn’t have to be a time loop, just a logical dependency. Your expected payoff is α[p^2+4(1-p)p] + (1-α)[p+4(1-p)]. Since you will make the same decision both times, the only coherent state is α=1/(p+1). Thus expected payoff is (8p-6p^2)/(p+1), whose maximum is at about p=0.53. What went wrong this time? Well, while this is what you should use to answer bets about your payoff (assuming such bets are offered independently at every intersection), it is not the quantity you should maximize: it double counts the path where you visit both X and Y, which involves two instances of the decision but pays off only once.
The parents that you referred to are now at 17 and 22 points, which seems a bit mad to me. Spotting the errors in P&R’s reasoning isn’t really the problem. The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.
So far, we know that UDT works but it doesn’t compute or make use of “probability of being at X” so epistemically it doesn’t seem very satisfying. Does TDT give the right answer when applied to this problem? If so, how? (It’s not specified formally enough that I can just apply it mechanically.) Does this problem suggest any improvements or alternative algorithms?
Awesome. I’m steadily upgrading my expected utilities of handing decision-theory problems to Less Wrong.
Again, that seems to imply that the problem is solved, and I don’t quite see how the parent comments have done that.
The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.
I presented a solution in a comment here which I think satisfies these: It gives the right answer and consistently handles the case of “partial knowledge” about one’s intersection, and correctly characterizes your epistemic condition in the absent-minded case.
I don’t see why the problem is not solved. The probability of being at X depends directly on how I am deciding whether to turn. So I cannot possibly use that probability to decide whether to turn; I need to decide on how I will turn first, and then I can calculate the probability of being at X. This results in the original solution.
This also shows that Eliezer was mistaken in claiming that any algorithm involving randomness can be improved by making it deterministic.
And then you can correct for the double-counting. When would you like to count your chickens? It’s safe to count them at X or Y.
If you count them at X, then how much payoff do you expect at the end? Relative to when you’ll be counting your payoff, the relative likelihood that you are at X is 1. And the expected payoff if you are at X is p^2 + 4p(1-p). This gives a total expected payoff of P(X) E(X) = 1 (p^2 + 4p(1-p)) = p^2 + 4p(1-p).
If you count them at Y, then you much payoff do you expect at the end? Relative to when you’ll be counting your payoff, the relative likelihood that you are at Y is p. And the expected payoff if you are at Y is p + 4(1-p). This gives a total expected payoff of P(Y) E(Y) = p (p + 4(1-p)) = p^2 + 4(1-p).
I’m annoyed that English requires a tense on all verbs. “You are” above should be tenseness.
One way to describe this is to note that choosing the action that maximises the expectation of value is not the same as choosing that action that can be expected to produce the most value. So choosing p=0.53 maximises our expectations, not our expectation of production of value.
Doesn’t seem to want to let me edit the comment above, but I could have explained this clearer. The figure (8p-6p^2)/(p+1) is actually a weighted mean of Ex and Ey where these are the expected values at X and Y respectively. Specifically, this value is:
(1*Ex+p*Ey)/(1+p)
Now, the expected value calculated from the planning optimal decision which is just Ex. We shouldn’t be surprised that the weighted mean is quite a different value.
Since you will make the same decision both times, the only coherent state is α=1/(p+1).
I’m curious how you arrived at this. Shouldn’t it be α = (1/2)p + (1 - p) = 1 - p/2? (The other implies that you are a thirder in the Sleeping Beauty Problem—didn’t Nick Bostrum have the last word on that one?) The payoff becomes α[p^2+4p(1-p)] + (1-α)[p+4(1-p)] = (1 - p/2)(4p − 3p^2) + (p/2)(4 − 3p) = 6p - (13/2)p^2 + (3/2)p^3, which has a (local) maximum around p = 0.577.
alpha = 1/(p+1) because the driver is at Y p times for every 1 time the driver is at X; so times the driver is at X / (times the driver is at X or Y) = 1 / (p+1).
The problem with pengvado’s calculation isn’t double counting. It purports to give an expected payoff when made at X, which doesn’t count the expected payoff at Y. The problem is that it doesn’t really give an expected payoff. alpha purports to be the probability that you are at X; yet the calculation must be made at X, not at Y (where alpha will clearly be wrong). This means we can’t speak of a “probability of being at X”; alpha simply is 1 if we use this equation and believe it gives us an expected value.
Or look at it this way: Before you introduce alpha into the equation, you can solve it and get the actual optimal value for p. Once you introduce alpha into the equation, you guarantee that the driver will have false beliefs some of the time, because alpha = 1/(p+1), and so the driver can’t have the correct alpha both at X and at Y. You have added a source of error, and will not find the optimal solution.
If you want to find the value of p that leads to the optimal decision you need to look at the impact on expected value of choosing one p or another, not just consider expected value at the end. Currently, it maximises expectations, not value created, with situations where you pass through X and Y being double counted.
I’m a “who’s offering the bet”er on Sleeping Beauty (which Bostrom has said is consistent with, though not identical to, his own model). And in this case I specified bets offered and paid separately at each intersection, which corresponds to the thirder conclusion.
You cannot assume any α, and choose p based on it, for α depends on p. You just introduced a time loop into your example.
Indeed, though it doesn’t have to be a time loop, just a logical dependency. Your expected payoff is α[p^2+4(1-p)p] + (1-α)[p+4(1-p)]. Since you will make the same decision both times, the only coherent state is α=1/(p+1). Thus expected payoff is (8p-6p^2)/(p+1), whose maximum is at about p=0.53. What went wrong this time? Well, while this is what you should use to answer bets about your payoff (assuming such bets are offered independently at every intersection), it is not the quantity you should maximize: it double counts the path where you visit both X and Y, which involves two instances of the decision but pays off only once.
Mod parents WAY up! I should’ve tried to solve this problem on my own, but I wasn’t expecting it to be solved in the comments like that!
Awesome. I’m steadily upgrading my expected utilities of handing decision-theory problems to Less Wrong.
EDIT 2016: Wei Dai below is correct, this was my first time encountering this problem and I misunderstood the point Wei Dai was trying to make.
You make it sound as if you expect to expect a higher utility in the future than you currently expect...
The parents that you referred to are now at 17 and 22 points, which seems a bit mad to me. Spotting the errors in P&R’s reasoning isn’t really the problem. The problem is to come up with a general decision algorithm that both works (in the sense of making the right decisions) and (if possible) makes epistemic sense.
So far, we know that UDT works but it doesn’t compute or make use of “probability of being at X” so epistemically it doesn’t seem very satisfying. Does TDT give the right answer when applied to this problem? If so, how? (It’s not specified formally enough that I can just apply it mechanically.) Does this problem suggest any improvements or alternative algorithms?
Again, that seems to imply that the problem is solved, and I don’t quite see how the parent comments have done that.
I presented a solution in a comment here which I think satisfies these: It gives the right answer and consistently handles the case of “partial knowledge” about one’s intersection, and correctly characterizes your epistemic condition in the absent-minded case.
I don’t see why the problem is not solved. The probability of being at X depends directly on how I am deciding whether to turn. So I cannot possibly use that probability to decide whether to turn; I need to decide on how I will turn first, and then I can calculate the probability of being at X. This results in the original solution.
This also shows that Eliezer was mistaken in claiming that any algorithm involving randomness can be improved by making it deterministic.
And then you can correct for the double-counting. When would you like to count your chickens? It’s safe to count them at X or Y.
If you count them at X, then how much payoff do you expect at the end? Relative to when you’ll be counting your payoff, the relative likelihood that you are at X is 1. And the expected payoff if you are at X is p^2 + 4p(1-p). This gives a total expected payoff of P(X) E(X) = 1 (p^2 + 4p(1-p)) = p^2 + 4p(1-p).
If you count them at Y, then you much payoff do you expect at the end? Relative to when you’ll be counting your payoff, the relative likelihood that you are at Y is p. And the expected payoff if you are at Y is p + 4(1-p). This gives a total expected payoff of P(Y) E(Y) = p (p + 4(1-p)) = p^2 + 4(1-p).
I’m annoyed that English requires a tense on all verbs. “You are” above should be tenseness.
EDIT: formatting
One way to describe this is to note that choosing the action that maximises the expectation of value is not the same as choosing that action that can be expected to produce the most value. So choosing p=0.53 maximises our expectations, not our expectation of production of value.
Doesn’t seem to want to let me edit the comment above, but I could have explained this clearer. The figure (8p-6p^2)/(p+1) is actually a weighted mean of Ex and Ey where these are the expected values at X and Y respectively. Specifically, this value is:
(1*Ex+p*Ey)/(1+p)
Now, the expected value calculated from the planning optimal decision which is just Ex. We shouldn’t be surprised that the weighted mean is quite a different value.
I’m curious how you arrived at this. Shouldn’t it be α = (1/2)p + (1 - p) = 1 - p/2? (The other implies that you are a thirder in the Sleeping Beauty Problem—didn’t Nick Bostrum have the last word on that one?) The payoff becomes α[p^2+4p(1-p)] + (1-α)[p+4(1-p)] = (1 - p/2)(4p − 3p^2) + (p/2)(4 − 3p) = 6p - (13/2)p^2 + (3/2)p^3, which has a (local) maximum around p = 0.577.
The conclusion remains the same, of course.
alpha = 1/(p+1) because the driver is at Y p times for every 1 time the driver is at X; so times the driver is at X / (times the driver is at X or Y) = 1 / (p+1).
The problem with pengvado’s calculation isn’t double counting. It purports to give an expected payoff when made at X, which doesn’t count the expected payoff at Y. The problem is that it doesn’t really give an expected payoff. alpha purports to be the probability that you are at X; yet the calculation must be made at X, not at Y (where alpha will clearly be wrong). This means we can’t speak of a “probability of being at X”; alpha simply is 1 if we use this equation and believe it gives us an expected value.
Or look at it this way: Before you introduce alpha into the equation, you can solve it and get the actual optimal value for p. Once you introduce alpha into the equation, you guarantee that the driver will have false beliefs some of the time, because alpha = 1/(p+1), and so the driver can’t have the correct alpha both at X and at Y. You have added a source of error, and will not find the optimal solution.
If you want to find the value of p that leads to the optimal decision you need to look at the impact on expected value of choosing one p or another, not just consider expected value at the end. Currently, it maximises expectations, not value created, with situations where you pass through X and Y being double counted.
I’m a “who’s offering the bet”er on Sleeping Beauty (which Bostrom has said is consistent with, though not identical to, his own model). And in this case I specified bets offered and paid separately at each intersection, which corresponds to the thirder conclusion.
The paper covered that, but good point.