An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it’s very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large ‘free’ cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor’s infallibility. Generally, I think grasping verbal arguments doesn’t “modify” humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it’s very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large ‘free’ cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor’s infallibility. Generally, I think grasping verbal arguments doesn’t “modify” humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
EDIT: fixeds some typos.