(I work at MIRI, and edited the Cheating Death in Damascus paper, but this comment wasn’t reviewed by anyone else at MIRI.)
This should be a constraint on any plausible decision theory.
But this principle prevents you from cooperating with yourself across empirical branches in the world!
Suppose a good predictor offers you a fair coin flip at favorable odds (say, 2 of their dollars to one of yours). If you called correctly, you can either forgive (no money moves) or demand; if you called incorrectly, you can either pay up or back out. The predictor only responds to your demand that they pay up if they predict that you would yourself pay up when you lose, but otherwise this interaction doesn’t affect the rest of your life.
You call heads, the coin comes up tails. The Guaranteed Payoffs principle says:
You’re certain that you’re in a world where you will just lose a dollar if you pay up, and will lose no dollars if you don’t pay up. It maximizes utility conditioned on this starting spot to not pay up.
The FDT perspective is to say:
The price of winning $2 in half of the worlds is losing $1 in the other half of the worlds. You want to be the sort of agent who can profit from these sorts of bets and/or you want to take this opportunity to transfer utility across worlds, because it’s net profitable.
Note that the Bomb case is one in which we condition on the 1 in a trillion trillion failure case, and ignore the 999999999999999999999999 cases in which FDT saves $100. This is like pointing at people who got into a plane that crashed and saying “what morons, choosing to get on a plane that would crash!” instead of judging their actions from the state of uncertainty that they were in when they decided to get on the plane.
This is what Abram means when he says “with respect to the prior of the decision problem”; not that the FDT agent is expected to do well from any starting spot, but from the ‘natural’ one. (If the problem statement is as described and the FDT agent sees “you’ll take the right box” and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.) It’s not that the FDT agent wanders through the world unable to determine where it is even after obtaining evidence; it’s that as the FDT agent navigates the world it considers its impact across all (connected) logical space instead of just immediately downstream of itself. Note that in my coin flip case, FDT is still trying to win the reward when the coin comes up heads even though in this case it came up tails, as opposed to saying “well, every time I see this problem the coin will come up tails, therefore I shouldn’t participate in the bet.”
[I do think this jump, from ‘only consider things downstream of you’ to ‘consider everything’, does need justification and I think the case hasn’t been as compelling as I’d like it to be. In particular, the old name for this, ‘updatelessness’, threw me for a loop for a while because it sounded like the dumb “don’t take input from your environment” instead of the conscious “consider what impact you’re having on hypothetical versions of yourself”.]
But then, it seems to me, that FDT has lost much of its initial motivation: the case for one-boxing in Newcomb’s problem didn’t seem to stem from whether the Predictor was running a simulation of me, or just using some other way to predict what I’d do.
It seems to me like either you are convinced that the predictor is using features you can control (based on whether or not you decide to one-box) or features you can’t control (like whether you’re English or Scottish). If you think the latter, you two-box (because regardless of whether the predictor is rewarding you for being Scottish or not, you benefit from the $1000), and if you think the former you one-box (because you want to move the probability that the predictor fills the large box).
According to me, the simulation is just a realistic way to instantiate an actual dependence between the decision I’m making now and the prediction. (Like, when we have AIs we’ll actually be able to put them in Newcomb-like scenarios!) If you want to posit a different, realistic version of that, then FDT is able to handle it (and the difficulty is all in moving from the English description of the problem to the subjunctive dependency graph).
Now, because there’s an agent making predictions, the FDT adherent will presumably want to say that the right action is one-boxing.
I don’t think this is right; I think this is true only if the FDT agent thinks that S (a physically verifiable fact about the world, like the lesion) is logically downstream of its decision. In the simplest such graph I can construct, S is still logically upstream of the decision; are we making different graphs?
But it’s very implausible that there’s some S such that a tiny change in its physical makeup should affect whether one ought to one-box or two-box.
I don’t buy this as an objection; decisions are often discontinuous. Suppose I’m considering staying at two different hotels, one with price A and the other with price B with B<A; then construct a series of changes to A that moves it imperceptibly, and at some point my decision switches abruptly from staying at hotel B to staying at hotel A. Whenever you pass multiple continuous quantities through an argmin or argmax, you can get sudden changes.
(Or, put a more analogous way, you can imagine insurance against an event with probability p, and we smoothly vary p, and at some point our action discontinuously jumps from not buying the insurance to buying the insurance.)
I am deeply confused how someone who is taking decision theory seriously can accept Guaranteed Payoffs as correct. I’m even more confused how it can seem so obvious that anyone violating it has a fatal problem.
Under certainty, this is assuming CDT is correct, when CDT seems to have many problems other than certainty. We can use Vaniver’s examples above, or use a reliable insurance agent to remove any uncertainty, or we also can use any number of classic problems without any uncertainty (or remove it), and see that such an agent loses—e.g. Parfit’s Hitchhiker in the case where he has 100% accuracy.
In particular, the old name for this, ‘updatelessness’, threw me for a loop for a while because it sounded like the dumb “don’t take input from your environment” instead of the conscious “consider what impact you’re having on hypothetical versions of yourself”.
As a further example, consider glomarization. If I haven’t committed a crime, pleading the fifth is worse than pleading innocence; however it means that when I have committed a crime, I have to either pay the costs of pleading guilty, pay the costs of lying, or plead the fifth (which will code to “I’m guilty”, because I never say it when I’m innocent). If I care about honesty and being difficult to distinguish from the versions of myself who commit crimes, then I want to glomarize even before I commit any crimes.
If the problem statement is as described and the FDT agent sees “you’ll take the right box” and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.
(I work at MIRI, and edited the Cheating Death in Damascus paper, but this comment wasn’t reviewed by anyone else at MIRI.)
But this principle prevents you from cooperating with yourself across empirical branches in the world!
Suppose a good predictor offers you a fair coin flip at favorable odds (say, 2 of their dollars to one of yours). If you called correctly, you can either forgive (no money moves) or demand; if you called incorrectly, you can either pay up or back out. The predictor only responds to your demand that they pay up if they predict that you would yourself pay up when you lose, but otherwise this interaction doesn’t affect the rest of your life.
You call heads, the coin comes up tails. The Guaranteed Payoffs principle says:
The FDT perspective is to say:
Note that the Bomb case is one in which we condition on the 1 in a trillion trillion failure case, and ignore the 999999999999999999999999 cases in which FDT saves $100. This is like pointing at people who got into a plane that crashed and saying “what morons, choosing to get on a plane that would crash!” instead of judging their actions from the state of uncertainty that they were in when they decided to get on the plane.
This is what Abram means when he says “with respect to the prior of the decision problem”; not that the FDT agent is expected to do well from any starting spot, but from the ‘natural’ one. (If the problem statement is as described and the FDT agent sees “you’ll take the right box” and the FDT agent takes the left box, then it must be the case that this was the unlucky bad prediction and made unlikely accordingly.) It’s not that the FDT agent wanders through the world unable to determine where it is even after obtaining evidence; it’s that as the FDT agent navigates the world it considers its impact across all (connected) logical space instead of just immediately downstream of itself. Note that in my coin flip case, FDT is still trying to win the reward when the coin comes up heads even though in this case it came up tails, as opposed to saying “well, every time I see this problem the coin will come up tails, therefore I shouldn’t participate in the bet.”
[I do think this jump, from ‘only consider things downstream of you’ to ‘consider everything’, does need justification and I think the case hasn’t been as compelling as I’d like it to be. In particular, the old name for this, ‘updatelessness’, threw me for a loop for a while because it sounded like the dumb “don’t take input from your environment” instead of the conscious “consider what impact you’re having on hypothetical versions of yourself”.]
It seems to me like either you are convinced that the predictor is using features you can control (based on whether or not you decide to one-box) or features you can’t control (like whether you’re English or Scottish). If you think the latter, you two-box (because regardless of whether the predictor is rewarding you for being Scottish or not, you benefit from the $1000), and if you think the former you one-box (because you want to move the probability that the predictor fills the large box).
According to me, the simulation is just a realistic way to instantiate an actual dependence between the decision I’m making now and the prediction. (Like, when we have AIs we’ll actually be able to put them in Newcomb-like scenarios!) If you want to posit a different, realistic version of that, then FDT is able to handle it (and the difficulty is all in moving from the English description of the problem to the subjunctive dependency graph).
I don’t think this is right; I think this is true only if the FDT agent thinks that S (a physically verifiable fact about the world, like the lesion) is logically downstream of its decision. In the simplest such graph I can construct, S is still logically upstream of the decision; are we making different graphs?
I don’t buy this as an objection; decisions are often discontinuous. Suppose I’m considering staying at two different hotels, one with price A and the other with price B with B<A; then construct a series of changes to A that moves it imperceptibly, and at some point my decision switches abruptly from staying at hotel B to staying at hotel A. Whenever you pass multiple continuous quantities through an argmin or argmax, you can get sudden changes.
(Or, put a more analogous way, you can imagine insurance against an event with probability p, and we smoothly vary p, and at some point our action discontinuously jumps from not buying the insurance to buying the insurance.)
I am deeply confused how someone who is taking decision theory seriously can accept Guaranteed Payoffs as correct. I’m even more confused how it can seem so obvious that anyone violating it has a fatal problem.
Under certainty, this is assuming CDT is correct, when CDT seems to have many problems other than certainty. We can use Vaniver’s examples above, or use a reliable insurance agent to remove any uncertainty, or we also can use any number of classic problems without any uncertainty (or remove it), and see that such an agent loses—e.g. Parfit’s Hitchhiker in the case where he has 100% accuracy.
As a further example, consider glomarization. If I haven’t committed a crime, pleading the fifth is worse than pleading innocence; however it means that when I have committed a crime, I have to either pay the costs of pleading guilty, pay the costs of lying, or plead the fifth (which will code to “I’m guilty”, because I never say it when I’m innocent). If I care about honesty and being difficult to distinguish from the versions of myself who commit crimes, then I want to glomarize even before I commit any crimes.
Comment removed for posterity.
See also Nate Soares in Decisions are for making bad outcomes inconsistent. This is sort of a generalization, where ‘decisions are for making bad outcomes unlikely.’