In my understanding you can apply EDT after you know what P(O_j|A) is. How you determine that quantity is outside the scope of decision theories.
It seems like there are two views of decision theories. The first view is that a decision theory eats a problem description and outputs an action. The second view is that a decision theory eats a model and outputs an action.
I strongly suspect that IlyaShpitser holds the first view (see here), and I’ve held both in different situations. Even when holding the second view, though, the different decision theories ask for different models, and those models must be generated somehow. I take the view that one should use off-the-shelf components for them, unless otherwise specified, and this assumption turns the second view into the first view.
I should note here that the second view is not very useful practically; most of a decision analysis class will center around how to turn a problem description into a model, since the mathematics of turning models into decisions is very simple by comparison.
When EDT is presented with a problem where observational data is supplied, EDT complains that it needs conditional probabilities, not observational data. The “off-the-shelf” way of transforming that data into conditional probabilities is to conditionalize on the possible actions within the observational data, and then EDT will happily pick the action with the highest utility weighted by conditional probability.
When CDT is presented with the same problem, it complains that it needs a causal model. The “off-the-shelf” way of transforming observational data into a causal model is described in Causality, and so I won’t go into it here, but once that’s done CDT will happily pick the action with the highest utility weighted by counterfactual probability.
Can we improve on the “off-the-shelf” method for EDT? If we apply some intuition to the observational data, we can narrow the reference class and get probabilities that are more meaningful. But this sort of patching is unsatisfying. At best, we recreate the causal model discovered by the off-the-shelf methods used by CDT, and now we’re using CDT by another name. This is what IlyaShpitser meant by:
If you are willing to call such a thing “EDT”, then EDT can mean whatever you want it to mean.
At worst, our patches only did part of the job. Maybe we thought to check for reversal effects, and found the obvious ones, but not complicated ones. Maybe we thought some variable would be significant and excluded sample data with differences on that variable, but in fact it was not causally significant (which wouldn’t matter in the infinite-data case, but would matter for realistic cases).
The reason to prefer CDT over EDT is that causal models contain more information than joint probability distributions, and you want your decision theory to make use of as much information as possible in as formal a way as possible. Yes, you can patch EDT to make it CDTish, but then it’s not really EDT; it’s you running CDT and putting the results into EDT’s formatting.
In fact, in the example I gave, I fully specified everything needed for each decision theory to output an answer—I gave a causal model to CDT (because I gave the graph under standard interventionist semantics), and a joint distribution over all observable variables to EDT (infinite sample size!). I just wanted someone to give me the right answer using EDT (and explain how they got it).
EDT is not allowed to refer to causal concepts like “confounder” or “causal effect” when making a decision (otherwise it is not EDT).
It seems like there are two views of decision theories. The first view is that a decision theory eats a problem description and outputs an action. The second view is that a decision theory eats a model and outputs an action.
I strongly suspect that IlyaShpitser holds the first view (see here), and I’ve held both in different situations. Even when holding the second view, though, the different decision theories ask for different models, and those models must be generated somehow. I take the view that one should use off-the-shelf components for them, unless otherwise specified, and this assumption turns the second view into the first view.
I should note here that the second view is not very useful practically; most of a decision analysis class will center around how to turn a problem description into a model, since the mathematics of turning models into decisions is very simple by comparison.
When EDT is presented with a problem where observational data is supplied, EDT complains that it needs conditional probabilities, not observational data. The “off-the-shelf” way of transforming that data into conditional probabilities is to conditionalize on the possible actions within the observational data, and then EDT will happily pick the action with the highest utility weighted by conditional probability.
When CDT is presented with the same problem, it complains that it needs a causal model. The “off-the-shelf” way of transforming observational data into a causal model is described in Causality, and so I won’t go into it here, but once that’s done CDT will happily pick the action with the highest utility weighted by counterfactual probability.
Can we improve on the “off-the-shelf” method for EDT? If we apply some intuition to the observational data, we can narrow the reference class and get probabilities that are more meaningful. But this sort of patching is unsatisfying. At best, we recreate the causal model discovered by the off-the-shelf methods used by CDT, and now we’re using CDT by another name. This is what IlyaShpitser meant by:
At worst, our patches only did part of the job. Maybe we thought to check for reversal effects, and found the obvious ones, but not complicated ones. Maybe we thought some variable would be significant and excluded sample data with differences on that variable, but in fact it was not causally significant (which wouldn’t matter in the infinite-data case, but would matter for realistic cases).
The reason to prefer CDT over EDT is that causal models contain more information than joint probability distributions, and you want your decision theory to make use of as much information as possible in as formal a way as possible. Yes, you can patch EDT to make it CDTish, but then it’s not really EDT; it’s you running CDT and putting the results into EDT’s formatting.
In fact, in the example I gave, I fully specified everything needed for each decision theory to output an answer—I gave a causal model to CDT (because I gave the graph under standard interventionist semantics), and a joint distribution over all observable variables to EDT (infinite sample size!). I just wanted someone to give me the right answer using EDT (and explain how they got it).
EDT is not allowed to refer to causal concepts like “confounder” or “causal effect” when making a decision (otherwise it is not EDT).