There is no single ‘right answer’ in this case. The answer will depend on your prior for the confounder.
As others have noted, the question “is HAART killing people?” has nothing to do with EDT …
I think I disagree with both of these assertions. First, there is the “right answer,” and it has nothing to do with priors or Bayesian reasoning. In fact there is no model uncertainty in the problem—I gave you “the truth” (the precise structure of the model and enough data to parameterize it precisely so you don’t have to pick or average among a set of alternatives). All you have to do is answer a question related to a single parameter of the model I gave you. The only question is which parameter of the model I am asking you about. Second, it’s easy enough to rephrase my question to be a decision theory question (I do so here:
Ok. This is wrong. The problem is P(death|HAART) isn’t telling you whether HAART is bad or not (due to unobserved confounding). I have already specified that there is confounding by health status (that is, HAART helps, but was only given to people who were very sick). What you need to compare is
Note that I defined HAART as “put this patient on HAART”, not the probability of death when giving HAART in general (maybe I should have used a different notation).
If I understand your model correctly then
A0 = is HAART given at time t=0 (boolean)
L0 = time to wait (seconds, positive)
A1 = is HAART given (again) at time t=L0 (boolean)
with the confounding variable H1, the health at time t=L0, which influences the choice of A1. You didn’t specify how L0 was determined, is it fixed or does it also depend on the patient’s health? Your formula above suggests that it depends only on the choice A0.
Now a new patient comes in, and you want to know whether you should pick A0=true/false and A1=true/false. Now for the new patient x, you want to estimate P(death[x] | A0[x],A1[x]). If it was just about A0[x], then it would be easy, since the assignment was randomized, so we know that A0 is independent of any confounders. But this is not true for A1, in fact, we have no good data with which to estimate A1[x], since we only have samples where A1 was chosen according to the health-status based policy.
Note that I defined HAART as “put this patient on HAART”, not the probability of death when giving HAART in general (maybe I should have used a different notation).
Yes, you should have. The notation “P(!death|HAART)” means “find every record with HAART, and calculate the percentage of them with !death.” This is how EDT as an epistemic approach generates numbers to use for decisions. Why am I specifying as an epistemic approach? Because EDT and CDT ask for different sorts of information with which to make decisions, and thus have different natural epistemologies. CDT asks for “P(!death| do(HAART))”, which is not the sort of information EDT asks for, and thus not the sort of information an EDT model has access to.
To go back to an earlier statement:
So if you give me P(!death|HAART) and P(!death,!HAART) then I can give you a decision.
IlyaShpitser is asking you how you would calculate those from empirical data. The EDT answer uses the technical notation you used before, and it’s the suboptimal way to do things.
in fact, we have no good data with which to estimate A1[x]
Really? My impression is that the observational records are good enough to get some knowledge. (Indeed, they must be good enough; lives are on the line, and saying “I don’t have enough info” will result in more deaths than “this is the best I can do with the existing info.”)
IlyaShpitser is asking you how you would calculate those from empirical data. The EDT answer uses the technical notation you used before
EDT does not answer this question, at least, the definition of EDT I found on wikipedia makes no mention of it. Can you point me to a description of EDT that includes the estimation of probabilities?
in fact, we have no good data with which to estimate A1[x]
I should have said “to estimate the effect of A1[x]”.
Really? My impression is that the observational records are good enough to get some knowledge. (Indeed, they must be good enough; lives are on the line, and saying “I don’t have enough info” will result in more deaths than “this is the best I can do with the existing info.”)
Sure, you can do something to make an estimate. But as I understand it, estimating causal models (which is what you need to estimate A1[x]) from observational data is a hard problem. That is why clinical trials use randomization, and studies that don’t try very hard to control for all possible confounders.
EDT does not answer this question, at least, the definition of EDT I found on wikipedia makes no mention of it. Can you point me to a description of EDT that includes the estimation of probabilities?
I don’t think you’re interpreting the wikipedia article correctly. It states that the value of an action is the sum over the conditional probability times the desirability. This means that the decision-relevant probability of an outcome O_j given we do action A is
) from the observational data. (You can slice this observational data to get a more appropriate reference class, but no guidance is given on how to do this. Causal network discovery formalizes this process, commonly referred to as ‘factorization,’ as well as making a few other technical improvements.)
But as I understand it, estimating causal models (which is what you need to estimate A1[x]) from observational data is a hard problem.
This means that the decision-relevant probability of an outcome O_j given we do action A is P(O_j|A) …
Agreed.
… from the observational data.
Where does it say how P(O_j|A) is estimated? Or that observational data comes into it at all? In my understanding you can apply EDT after you know what P(O_j|A) is. How you determine that quantity is outside the scope of decision theories.
In my understanding you can apply EDT after you know what P(O_j|A) is. How you determine that quantity is outside the scope of decision theories.
It seems like there are two views of decision theories. The first view is that a decision theory eats a problem description and outputs an action. The second view is that a decision theory eats a model and outputs an action.
I strongly suspect that IlyaShpitser holds the first view (see here), and I’ve held both in different situations. Even when holding the second view, though, the different decision theories ask for different models, and those models must be generated somehow. I take the view that one should use off-the-shelf components for them, unless otherwise specified, and this assumption turns the second view into the first view.
I should note here that the second view is not very useful practically; most of a decision analysis class will center around how to turn a problem description into a model, since the mathematics of turning models into decisions is very simple by comparison.
When EDT is presented with a problem where observational data is supplied, EDT complains that it needs conditional probabilities, not observational data. The “off-the-shelf” way of transforming that data into conditional probabilities is to conditionalize on the possible actions within the observational data, and then EDT will happily pick the action with the highest utility weighted by conditional probability.
When CDT is presented with the same problem, it complains that it needs a causal model. The “off-the-shelf” way of transforming observational data into a causal model is described in Causality, and so I won’t go into it here, but once that’s done CDT will happily pick the action with the highest utility weighted by counterfactual probability.
Can we improve on the “off-the-shelf” method for EDT? If we apply some intuition to the observational data, we can narrow the reference class and get probabilities that are more meaningful. But this sort of patching is unsatisfying. At best, we recreate the causal model discovered by the off-the-shelf methods used by CDT, and now we’re using CDT by another name. This is what IlyaShpitser meant by:
If you are willing to call such a thing “EDT”, then EDT can mean whatever you want it to mean.
At worst, our patches only did part of the job. Maybe we thought to check for reversal effects, and found the obvious ones, but not complicated ones. Maybe we thought some variable would be significant and excluded sample data with differences on that variable, but in fact it was not causally significant (which wouldn’t matter in the infinite-data case, but would matter for realistic cases).
The reason to prefer CDT over EDT is that causal models contain more information than joint probability distributions, and you want your decision theory to make use of as much information as possible in as formal a way as possible. Yes, you can patch EDT to make it CDTish, but then it’s not really EDT; it’s you running CDT and putting the results into EDT’s formatting.
In fact, in the example I gave, I fully specified everything needed for each decision theory to output an answer—I gave a causal model to CDT (because I gave the graph under standard interventionist semantics), and a joint distribution over all observable variables to EDT (infinite sample size!). I just wanted someone to give me the right answer using EDT (and explain how they got it).
EDT is not allowed to refer to causal concepts like “confounder” or “causal effect” when making a decision (otherwise it is not EDT).
I think I disagree with both of these assertions. First, there is the “right answer,” and it has nothing to do with priors or Bayesian reasoning. In fact there is no model uncertainty in the problem—I gave you “the truth” (the precise structure of the model and enough data to parameterize it precisely so you don’t have to pick or average among a set of alternatives). All you have to do is answer a question related to a single parameter of the model I gave you. The only question is which parameter of the model I am asking you about. Second, it’s easy enough to rephrase my question to be a decision theory question (I do so here:
http://lesswrong.com/lw/hwq/evidential_decision_theory_selection_bias_and/9cdk).
To quote your other comment:
You put the patient on HAART if and only if
V(HAART) > V(!HAART)
, whereIn these formulas
HAART
means “(decide to) put this patient on HAART” anddeath
means “this patient dies”.For concreteness, we can assume that the utility of death is low, say 0, while the utility of !death is positive. Then the decision reduces to
So if you give me
P(!death|HAART)
andP(!death,!HAART)
then I can give you a decision.Ok. This is wrong. The problem is P(death|HAART) isn’t telling you whether HAART is bad or not (due to unobserved confounding). I have already specified that there is confounding by health status (that is, HAART helps, but was only given to people who were very sick). What you need to compare is
∑L0E[death|A1,L0,A0]p[L0|A0]
for various values of A1, and A0.
Note that I defined
HAART
as “put this patient on HAART”, not the probability of death when giving HAART in general (maybe I should have used a different notation).If I understand your model correctly then
with the confounding variable H1, the health at time t=L0, which influences the choice of A1. You didn’t specify how L0 was determined, is it fixed or does it also depend on the patient’s health? Your formula above suggests that it depends only on the choice A0.
Now a new patient comes in, and you want to know whether you should pick A0=true/false and A1=true/false. Now for the new patient x, you want to estimate
P(death[x] | A0[x],A1[x])
. If it was just about A0[x], then it would be easy, since the assignment was randomized, so we know that A0 is independent of any confounders. But this is not true for A1, in fact, we have no good data with which to estimate A1[x], since we only have samples where A1 was chosen according to the health-status based policy.Yes, you should have. The notation “P(!death|HAART)” means “find every record with HAART, and calculate the percentage of them with !death.” This is how EDT as an epistemic approach generates numbers to use for decisions. Why am I specifying as an epistemic approach? Because EDT and CDT ask for different sorts of information with which to make decisions, and thus have different natural epistemologies. CDT asks for “P(!death| do(HAART))”, which is not the sort of information EDT asks for, and thus not the sort of information an EDT model has access to.
To go back to an earlier statement:
IlyaShpitser is asking you how you would calculate those from empirical data. The EDT answer uses the technical notation you used before, and it’s the suboptimal way to do things.
Really? My impression is that the observational records are good enough to get some knowledge. (Indeed, they must be good enough; lives are on the line, and saying “I don’t have enough info” will result in more deaths than “this is the best I can do with the existing info.”)
EDT does not answer this question, at least, the definition of EDT I found on wikipedia makes no mention of it. Can you point me to a description of EDT that includes the estimation of probabilities?
I should have said “to estimate the effect of A1[x]”.
Sure, you can do something to make an estimate. But as I understand it, estimating causal models (which is what you need to estimate A1[x]) from observational data is a hard problem. That is why clinical trials use randomization, and studies that don’t try very hard to control for all possible confounders.
I don’t think you’re interpreting the wikipedia article correctly. It states that the value of an action is the sum over the conditional probability times the desirability. This means that the decision-relevant probability of an outcome O_j given we do action A is
) from the observational data. (You can slice this observational data to get a more appropriate reference class, but no guidance is given on how to do this. Causal network discovery formalizes this process, commonly referred to as ‘factorization,’ as well as making a few other technical improvements.)Yes, but it’s a problem we have good approaches for.
Agreed.
Where does it say how
P(O_j|A)
is estimated? Or that observational data comes into it at all? In my understanding you can apply EDT after you know whatP(O_j|A)
is. How you determine that quantity is outside the scope of decision theories.It seems like there are two views of decision theories. The first view is that a decision theory eats a problem description and outputs an action. The second view is that a decision theory eats a model and outputs an action.
I strongly suspect that IlyaShpitser holds the first view (see here), and I’ve held both in different situations. Even when holding the second view, though, the different decision theories ask for different models, and those models must be generated somehow. I take the view that one should use off-the-shelf components for them, unless otherwise specified, and this assumption turns the second view into the first view.
I should note here that the second view is not very useful practically; most of a decision analysis class will center around how to turn a problem description into a model, since the mathematics of turning models into decisions is very simple by comparison.
When EDT is presented with a problem where observational data is supplied, EDT complains that it needs conditional probabilities, not observational data. The “off-the-shelf” way of transforming that data into conditional probabilities is to conditionalize on the possible actions within the observational data, and then EDT will happily pick the action with the highest utility weighted by conditional probability.
When CDT is presented with the same problem, it complains that it needs a causal model. The “off-the-shelf” way of transforming observational data into a causal model is described in Causality, and so I won’t go into it here, but once that’s done CDT will happily pick the action with the highest utility weighted by counterfactual probability.
Can we improve on the “off-the-shelf” method for EDT? If we apply some intuition to the observational data, we can narrow the reference class and get probabilities that are more meaningful. But this sort of patching is unsatisfying. At best, we recreate the causal model discovered by the off-the-shelf methods used by CDT, and now we’re using CDT by another name. This is what IlyaShpitser meant by:
At worst, our patches only did part of the job. Maybe we thought to check for reversal effects, and found the obvious ones, but not complicated ones. Maybe we thought some variable would be significant and excluded sample data with differences on that variable, but in fact it was not causally significant (which wouldn’t matter in the infinite-data case, but would matter for realistic cases).
The reason to prefer CDT over EDT is that causal models contain more information than joint probability distributions, and you want your decision theory to make use of as much information as possible in as formal a way as possible. Yes, you can patch EDT to make it CDTish, but then it’s not really EDT; it’s you running CDT and putting the results into EDT’s formatting.
In fact, in the example I gave, I fully specified everything needed for each decision theory to output an answer—I gave a causal model to CDT (because I gave the graph under standard interventionist semantics), and a joint distribution over all observable variables to EDT (infinite sample size!). I just wanted someone to give me the right answer using EDT (and explain how they got it).
EDT is not allowed to refer to causal concepts like “confounder” or “causal effect” when making a decision (otherwise it is not EDT).
IlyaShpitser is asking you to calculate those.