The two-boxer is trying to maximise money (utility). They are interested in the additional question of which bits of that money (utility) can be attributed to which things (decisions/agent types). “Caused gain” is a view about how we should attribute the gaining of money (utility) to different things.
So they agree that the problem is about maximising money (utility) and not “caused gain”. But they are interested in not just which agents end up with the most money (utility) but also which aspects of those agents is responsible for them receiving the money. Specifically, they are interested in whether the decisions the agent makes are responsible for the money they receive. This does not mean they are trying to maximise something other than money (utility). It means they are interested in maximising money and then also in how you can maximise money via different mechanisms.
An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it’s very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large ‘free’ cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor’s infallibility. Generally, I think grasping verbal arguments doesn’t “modify” humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
The two-boxer is trying to maximise money (utility). They are interested in the additional question of which bits of that money (utility) can be attributed to which things (decisions/agent types). “Caused gain” is a view about how we should attribute the gaining of money (utility) to different things.
So they agree that the problem is about maximising money (utility) and not “caused gain”. But they are interested in not just which agents end up with the most money (utility) but also which aspects of those agents is responsible for them receiving the money. Specifically, they are interested in whether the decisions the agent makes are responsible for the money they receive. This does not mean they are trying to maximise something other than money (utility). It means they are interested in maximising money and then also in how you can maximise money via different mechanisms.
An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it’s very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large ‘free’ cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor’s infallibility. Generally, I think grasping verbal arguments doesn’t “modify” humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
EDIT: fixeds some typos.