“why is a causal connection privileged?” I agree with everything here. What follows is merely history.
Historically, I think that CDT was meant to address the obvious shortcomings of choosing to bring about states that were merely correlated with good outcomes (as in the case of whitening one’s teeth to reduce lung cancer risk). When Pearl advocates CDT, he is mainly advocating acting based on robust connections that will survive the perturbation of the system caused by the action itself. (e.g. Don’t think you’ll cure lung cancer by making your population brush their teeth, because that is a non-robust correlation that will be eliminated once you change the system). The centrality of causality in decision making was obvious intuitively but wasn’t reflected in formal Bayesian decision theory. This was because of the lack of a good formalism linking probability and causality (and some erroneous positivistic scruples against the very idea of causality). Pearl and SGS’s work on causality has done much to address this, but I think there is much to be done.
There is a very annoying historical accident where EDT was taken to be the ‘one-boxing’ decision theory. First, any use of probability theory in the NP with infallible predictor is suspicious, because the problem can be specified in a logically complete way with no room for empirical uncertainty. (This is why dominance reasoning is brought in for CDT. What should the probabilities be?). Second, EDT is not easy to make coherent given an agent who knows they follow EDT. (The action that EDT disfavors will have probability zero and so the agent cannot condition on it in traditional probability theory). Third, EDT just barely one-boxes. It doesn’t one-box on Double Transparent Newcomb, nor on Counterfactual Mugging. It’s also obscure what it does on PD. (Again, I can play the PD against a selfish clone of myself, with both agents having each other’s source code. There is no empirical uncertainty here, and so applying probability theory immediate raises deep foundational problems).
If TDT/UDT had come first (including the logical models and deep connections to Godel’s theorem), the philosophy discussion of NP would have been very different. EDT (which brings into the NP very dubious empirical probability distributions) would not have been considered at all for NP. I don’t see that CDT would have held much interest if its alternative was not as feeble as EDT.
It is important to understand why economists have done so much work with Nash Equilibria (e.g. on the PD) rather than invent UDT. This is explained by the fact that the assumption of logical correlation and perfect empirical knowledge between agents in the PD is not the practical reality. This doesn’t mean that UDT is not relevant to practical situations, but only that these situations involve many additional elements that may be complex to deal with in UDT. Causal based theories would have been interesting independently, for the reasons noted above concerning robust correlations.
EDIT: I realize the comment by Paul Christiano sometimes describes UDT as a variant of EDT. When I used the term “EDT” I mean the theory discussed in the philosophy literature which involves choosing the action that maximizes P(outcomes / action). This is a theory which essentially makes use of vanilla conditional probability. In what I say, I assume that UDT/TDT, despite some similarity to EDT in spirit, are not limited to regular conditioning and do not fail on smoking lesion.
An additional point (discussed intelligence.org/files/TDT.pdf) is that CDT seems to recommend modifying oneself to a non-CDT based decision theory. (For instance, imagine that the CDTer contemplates for a moment the mere possibility of encountering NPs and can cheaply self-modify). After modification, the interest in whether decisions are responsible causally for utility will have been eliminated. So this interest seems extremely brittle. Agents able to modify and informed of the NP scenario will immediately lose the interest. (If the NP seems implausible, consider the ubiquity of some kind of logical correlation between agents in almost any multi-agent decision problem like the PD or stag hunt).
Now you may have in mind a two-boxer notion distinct from that of a CDTer. It might be fundamental to this agent to not forgo local causal gains. Thus a proposed self-modification that would preclude acting for local causal gains would always be rejected. This seems like a shift out of decision theory into value theory. (I think it’s very plausible that absent typical mechanisms of maintaining commitments, many humans would find it extremely hard to resist taking a large ‘free’ cash prize from the transparent box. Even prior schooling in one-boxing philosophy might be hard to stick to when face to face with the prize. Another factor that clashes with human intuitions is the predictor’s infallibility. Generally, I think grasping verbal arguments doesn’t “modify” humans in the relevant sense and that we have strong intuitions that may (at least in the right presentation of the NP) push us in the direction of local causal efficacy.)
EDIT: fixeds some typos.