I agree that figuring out what you “should have” precommitted can be fraught.
One possible response to that problem is to set aside some time to think about hypotheticals and figure out now what precommitments you would like to make, instead of waiting for those scenarios to actually happen. So the perspective is “actual you, at this exact moment”.
I sometimes suspect you could view MIRI’s decision theories as an example of this strategy.
Alice: Hey, Bob, have you seen this “Newcomb’s problem” thing?
Bob: Fascinating. As we both have unshakable faith in CDT, we can easily agree that two-boxing is correct if you are surprised by this problem, but that you should precommit to one-boxing if you have the opportunity.
Alice: I was thinking—now that we’ve realized this, why not precommit to one-boxing right now? You know, just in case. The premise of the problem is that Omega has some sort of access to our actual decision-making algorithm, so in principle we can precommit just by deciding to precommit.
Bob: That seems unobjectionable, but not very useful in expectation; we’re very unlikely to encounter this exact scenario. It seems like what we really ought to do is make a precommitment for the whole class of problems of which Newcomb’s problem is just one example.
Alice: Hm, that seems tricky to formally define. I’m not sure I can stick to the precommitment unless I understand it rigorously. Maybe if...
--Alice & Bob do a bunch of math, and eventually come up with a decision strategy that looks a lot like MIRI’s decision theory, all without ever questioning that CDT is absolutely philosophically correct?--
Possibly it’s not that simple; I’m not confident that I appreciate all the nuances of MIRI’s reasoning.
The output of this process is something people have taken to calling Son-of-CDT; the problem (insofar as we understand Son-of-CDT well enough to talk about its behavior) is that the resulting decision theory continues to neglect correlations that existed prior to self-modification.
(In your terms: Alice and Bob would only one-box in Newcomb variants where Omega based his prediction on them after they came up with their new decision theory; Newcomb variants where Omega’s prediction occurred before they had their talk would still be met with two-boxing, even if Omega is stipulated to be able to predict the outcome of the talk.)
This still does not seem like particularly sane behavior, which means, unfortunately, that there’s no real way for a CDT agent to fix itself: it was born with too dumb of a prior for even self-modification to save it.
Thanks. After thinking about your explanation for a while, I have made a small update in the direction of FDT. This example makes FDT seem parsimonious to me, because it makes a simpler precommitment.
I almost made a large update in the direction of FDT, but when I imagined explaining the reason for that update I ran into a snag. I imagined someone saying “OK, you’ve decided to precommit to one-boxing. Do you want to precommit to one-boxing when (a) Omega knows about this precommitment, or (b) Omega knows about this precommitment, AND the entangled evidence that Omega relied upon is ‘downstream’ of the precommitment itself? For example, in case (b), you would one-box if Omega read a transcript of this conversation, but not if Omega only read a meeting agenda that described how I planned to persuade you of option (a).”
But when phrased that way, it suddenly seems reasonable to reply: “I’m not sure what Omega would predict that I do if he could only see the meeting agenda. But I am sure that the meeting agenda isn’t going to change based on whether I pick (a) or (b) right now, so my choice can’t possibly alter what Omega puts into the box in that case. Thus, I see no advantage to precommiting to one-boxing in that situation.”
If Omega really did base its prediction just on the agenda (and not on, say, a scan of the source code of every living human), this reply seems correct to me. The story’s only interesting because Omega has god-like predictive abilities.
Which I guess shouldn’t be surprising, because if there were a version of Newcomb’s problem that cleanly split FDT from CDT without invoking extreme abilities on Omega’s part, I would expect that to be the standard version.
I’m left with a vague impression that FDT and CDT mostly disagree about “what rigorous mathematical model should we take this informal story-problem to be describing?” rather than “what strategy wins, given a certain rigorous mathematical model of the game?” CDT thinks you are choosing between $1K and $0, while FDT thinks you are choosing between $1K and $1M. If we could actually run the experiment, even in simulation, then that disagreement seems like it should have a simple empirical resolution; but I don’t think anyone knows how to do that. (Please correct me if I’m wrong!)
One way of noticing the Son-of-CDT issue dxu mentioned is thinking of CDT as not just being unable to control the events outside the future lightcone, but as not caring about the events outside the future lightcone. So even if it self-modifies, it’s not going to accept tradeoffs between the future and not-the-future of the self-modification event, as that would involve changing its preference (and somehow reinventing preference for the events it didn’t care about just before the self-modification event).
With time, CDT continually becomes numb to events outside its future, loses parts of its values. Self-modifying to Son-of-CDT stops further loss, but doesn’t reverse past loss.
I agree that figuring out what you “should have” precommitted can be fraught.
One possible response to that problem is to set aside some time to think about hypotheticals and figure out now what precommitments you would like to make, instead of waiting for those scenarios to actually happen. So the perspective is “actual you, at this exact moment”.
I sometimes suspect you could view MIRI’s decision theories as an example of this strategy.
Possibly it’s not that simple; I’m not confident that I appreciate all the nuances of MIRI’s reasoning.
The output of this process is something people have taken to calling Son-of-CDT; the problem (insofar as we understand Son-of-CDT well enough to talk about its behavior) is that the resulting decision theory continues to neglect correlations that existed prior to self-modification.
(In your terms: Alice and Bob would only one-box in Newcomb variants where Omega based his prediction on them after they came up with their new decision theory; Newcomb variants where Omega’s prediction occurred before they had their talk would still be met with two-boxing, even if Omega is stipulated to be able to predict the outcome of the talk.)
This still does not seem like particularly sane behavior, which means, unfortunately, that there’s no real way for a CDT agent to fix itself: it was born with too dumb of a prior for even self-modification to save it.
Thanks. After thinking about your explanation for a while, I have made a small update in the direction of FDT. This example makes FDT seem parsimonious to me, because it makes a simpler precommitment.
I almost made a large update in the direction of FDT, but when I imagined explaining the reason for that update I ran into a snag. I imagined someone saying “OK, you’ve decided to precommit to one-boxing. Do you want to precommit to one-boxing when (a) Omega knows about this precommitment, or (b) Omega knows about this precommitment, AND the entangled evidence that Omega relied upon is ‘downstream’ of the precommitment itself? For example, in case (b), you would one-box if Omega read a transcript of this conversation, but not if Omega only read a meeting agenda that described how I planned to persuade you of option (a).”
But when phrased that way, it suddenly seems reasonable to reply: “I’m not sure what Omega would predict that I do if he could only see the meeting agenda. But I am sure that the meeting agenda isn’t going to change based on whether I pick (a) or (b) right now, so my choice can’t possibly alter what Omega puts into the box in that case. Thus, I see no advantage to precommiting to one-boxing in that situation.”
If Omega really did base its prediction just on the agenda (and not on, say, a scan of the source code of every living human), this reply seems correct to me. The story’s only interesting because Omega has god-like predictive abilities.
Which I guess shouldn’t be surprising, because if there were a version of Newcomb’s problem that cleanly split FDT from CDT without invoking extreme abilities on Omega’s part, I would expect that to be the standard version.
I’m left with a vague impression that FDT and CDT mostly disagree about “what rigorous mathematical model should we take this informal story-problem to be describing?” rather than “what strategy wins, given a certain rigorous mathematical model of the game?” CDT thinks you are choosing between $1K and $0, while FDT thinks you are choosing between $1K and $1M. If we could actually run the experiment, even in simulation, then that disagreement seems like it should have a simple empirical resolution; but I don’t think anyone knows how to do that. (Please correct me if I’m wrong!)
One way of noticing the Son-of-CDT issue dxu mentioned is thinking of CDT as not just being unable to control the events outside the future lightcone, but as not caring about the events outside the future lightcone. So even if it self-modifies, it’s not going to accept tradeoffs between the future and not-the-future of the self-modification event, as that would involve changing its preference (and somehow reinventing preference for the events it didn’t care about just before the self-modification event).
With time, CDT continually becomes numb to events outside its future, loses parts of its values. Self-modifying to Son-of-CDT stops further loss, but doesn’t reverse past loss.