I’ve devised some additional scenarios that I have found to be helpful in contemplating this problem.
Scenario 1:
Omega proposes Newcomb’s problem to you. However, there is a twist: before he scans you, you may choose on of two robots to perform the box opening for you. Robot A will only open the $1M box; robot B will open both.
Scenario 2:
You wake up and suddenly find yourself in a locked room with two boxes, and a note from Omega: “I’ve scanned a hapless citizen (not you). predicted their course of action, and placed the appropriate amount of money in the two boxes present. Choose one or two and then you may go”
In scenario 1, both evidential and causal decision theories agree that you should one-box. In scenario 2, they both agree that you should two-box. Now, if we replace the robots with your future self and the hapless citizen with your past self, S1 becomes “what should you do prior to being scanned by Omega” and S2 reverts to the original problem. So now, omitting the possibility of fooling Omega as negligible, it can be seen that maximizing the payout from Newcomb’s problem is really about finding a way to cause your future self to one-box.
What options are available, to either rational agents or humans, for exerting causal power on their future selves? A human might make a promise to themselves (interesting question: is a promise a precommitment or a self-modification?), ask another person (or other agent) to provide disincentives for two-boxing (e.g. “Hey, Bob, I bet you I’ll one-box. If I win, I get $1; if you win, you get $1M), or find some way of modifying the environment to prevent their future self from two-boxing (e.g. drop the second box down a well). A general rational agent has similar options: modify itself into something that will one-box, and/or modify the environment so that one-boxing is the best course of action for its future self.
So now we have two solutions, but can we do better? If rational agent ‘Alpha’ doesn’t want to rely on external mechanisms to coerce it’s future’s behavior, and also does not want to introduce a hack into its source code, what general solution can it adopt that solves this general class of problem? I have not yet read the Timeless Decision Theory paper; I think I’ll ponder this question before doing so, and see if I encounter any interesting thoughts.
I’ve devised some additional scenarios that I have found to be helpful in contemplating this problem.
Scenario 1: Omega proposes Newcomb’s problem to you. However, there is a twist: before he scans you, you may choose on of two robots to perform the box opening for you. Robot A will only open the $1M box; robot B will open both.
Scenario 2: You wake up and suddenly find yourself in a locked room with two boxes, and a note from Omega: “I’ve scanned a hapless citizen (not you). predicted their course of action, and placed the appropriate amount of money in the two boxes present. Choose one or two and then you may go”
In scenario 1, both evidential and causal decision theories agree that you should one-box. In scenario 2, they both agree that you should two-box. Now, if we replace the robots with your future self and the hapless citizen with your past self, S1 becomes “what should you do prior to being scanned by Omega” and S2 reverts to the original problem. So now, omitting the possibility of fooling Omega as negligible, it can be seen that maximizing the payout from Newcomb’s problem is really about finding a way to cause your future self to one-box.
What options are available, to either rational agents or humans, for exerting causal power on their future selves? A human might make a promise to themselves (interesting question: is a promise a precommitment or a self-modification?), ask another person (or other agent) to provide disincentives for two-boxing (e.g. “Hey, Bob, I bet you I’ll one-box. If I win, I get $1; if you win, you get $1M), or find some way of modifying the environment to prevent their future self from two-boxing (e.g. drop the second box down a well). A general rational agent has similar options: modify itself into something that will one-box, and/or modify the environment so that one-boxing is the best course of action for its future self.
So now we have two solutions, but can we do better? If rational agent ‘Alpha’ doesn’t want to rely on external mechanisms to coerce it’s future’s behavior, and also does not want to introduce a hack into its source code, what general solution can it adopt that solves this general class of problem? I have not yet read the Timeless Decision Theory paper; I think I’ll ponder this question before doing so, and see if I encounter any interesting thoughts.