Ok, these problems I am posing are not abstract, they are concrete problems in medical decision making. In light of http://lesswrong.com/lw/jco/examples_in_mathematics/, I am going to pose 4 of them, right here, then tell you what the right answer is, and what assumptions I used to get this answer. Whatever decision theory you are using needs to be able to correctly represent and solve these problems, using at most the information that my solutions use, or it is not a very good decision theory (in the sense that there exist known alternatives that do solve these problems correctly). In all problems our utility penalizes patient deaths, or patients getting a disease.
In particular, if you are a user of EDT, you need to give me a concrete algorithm that correctly solves all these problems, without using any causal vocabulary. You can’t just beg off on how it’s a “non trivial problem.” These are decision problems, and there exist solutions for these problems now, using CDT! Can EDT solve them or not? I have yet to see anyone try to seriously engage these (with the notable exception of Paul, who to his credit did try to give a Bayesian/non-causal account of problem 3, but ran out of time).
Note : I am assuming the correct graph, and lots of samples, so the effect of a prior is not significant (and thus talk about empirical frequencies, e.g. p(c)). If we wanted to, we could do a Bayesian problem over possible causal graphs, with a prior, and/or a Bayesian problem for estimation, where we could talk about, for example, the posterior distribution of case histories C. I skipped all that to simplify the examples.
Problem 1:
We perform a randomized control trial for a drug (half the patients get the drug, half the patients do not). Some of the patients die (in both groups). Let A be a random variable representing whether the patient in our RCT dataset got the drug, and Y be a random variable representing whether the patient in our RCT dataset died. A new patient comes in which is from the same cohort as those in our RCT. Should we give them the drug?
Solution: Give the drug if and only if E[Y = yes | A = yes] < E[Y = yes | A = no].
Intuition for why this is correct: since we randomized the drug, there are no possible confounders between drug use and death. So any dependence between the drug and death is causal. So we can just look at conditional correlations.
Assumptions used: we need the empirical p(A,Y) from our RCT, and the assumption that the correct causal graph is A → Y. No other assumptions needed.
Ideas here: you should be able to transfer information you learn from observed members in a group to others members of the same group. Otherwise, what is stats even doing?
Problem 2:
We perform an observational study, where doctors assign (or not) a drug based on observed patient vitals recorded in their case history file. Some of the patients die. Let A be a random variable representing whether the patient in our dataset from our study got the drug, Y be a random variable representing whether the patient in our study died, and C be the random variable representing the patient vitals used by the doctors to decide whether to give the drug or not. A new patient comes in which is from the same cohort as those in our study. If we do not get any additional information on this patient, should we give them the drug?
Solution: Give the drug if and only if \sum{c} E[Y = yes | A = yes, c] p(c) < \sum{c} E[Y = yes | A = no, c] p(c)
Intuition for why this is correct: we have not randomized the drug, but we recorded all the info doctors used to decide on whether to give the drug. Since case history C represents all possible confounding between A and Y, conditional on knowing C, any dependence between A and Y is causal. In other words, E[Y | A, C] gives a causal dependence of A and Y, conditional on C. But since we are not allowed to measure anything about the incoming patient, we have to average over the possible case histories the patient might have. Since the patient is postulated to have come from the same dataset as those in our study, it is reasonable to average over the observed case histories in our study. This recovers the above formula.
Assumptions used: we need the empirical p(A,C,Y) from our study, and the assumption that the correct causal graph is C → A → Y, C → Y. No other assumptions needed.
Ideas here: this is isomorphic to the smoking lesion problem. The idea here is you can’t use observed correlations if there are confounders, you have to adjust for confounders properly using the g-formula (the formula in the answer).
Problem 3:
We perform a partially randomized and partially observational longitudinal study, where patients are randomly assigned (or not) a drug at time 0, then their vitals at time 1 are recorded in a file, and based on those vitals, and the treatment assignment history at time 0, doctors may (or not), decide to give them more of the drug. Afterwards, at time 2, some patients die (or not). Let A0 be a random variable representing whether the patient in our dataset from our study got the drug at time 0, A1 be a random variable representing whether the patient in our dataset from our study got the drug at time 1, Y be a random variable representing whether the patient in our study died, and C be the random variable representing the case history used by the doctors to decide whether to give the drug or not at time 1. A new patient comes in which is from the same cohort as those in our study. If we do not get any additional information on this patient, should we give them the drug, and if so at what time points?
Solution: Use the drug assignment policy (a0,a1) that minimizes \sum{c} E[Y = yes | A1 = a1, c, A0 = a0] p(c | A0 = a0).
Intuition for why this is correct: we have randomized A0, but have not randomized A1, and we are interested in the joint effect of both A0 and A1 on Y. We know C is a confounder for A1, so we have to adjust for it somehow as in Problem 2, otherwise an observed dependence of A1 and Y will contain a non-causal component through C. However, C is not a confounder for the relationship of A0 and Y. Conditional on A0 and C, the relationship between A1 and Y is entirely causal, so E[Y | A1, C, A0] is a causal quantity. However, for the incoming patient, we are not allowed to measure C, so we have to average over C as before in problem 2. However, in our case C is an effect of A0, which means we can’t just average the base rates for case histories, we have to take into account what happened at time 0, in other words the causal effect of A0 on C. Because in our graph, there are no confounders between A0 and C, the causal relationship can be represented by p(C | A0) (no confounders means correlation equals causation). Since A0 also has no confounders for Y, E[Y | A1, C, A0], weighted by p(C | A0) gives us the right causal relationship between {A0,A1} and Y.
Assumptions used: we need the empirical p(A0,C,A1,Y) from our study, and the assumption that the correct causal graph is A0 → C → A1 → Y, A0 → A1, A0 → Y, and we possibly allow that there is an unrestricted hidden variable U that is a parent of both C and Y. No other assumptions needed.
Ideas here: simply knowing that you have confounders is not enough, you have to pay attention to the precise causal relationships to figure out what the right thing to do is. In this case, C is a ‘time-varying confounder,’ and requires a more complicated adjustment that takes into account that the confounder is also an effect of an earlier treatment.
Problem 4:
We consider a (hypothetical) observational study of coprophagic treatment of stomach cancer. It is known (for the purposes of this hypothetical example) that coprophagia’s protective effect vs cancer is due to the presence of certain types of intestinal flora in feces. At the same time, people who engage in coprophagic behavior naturally are not a random sampling of the population, and therefore may be more likely than average to end up with stomach cancer. Let A be a random variable representing whether those in our study engaged in coprophagic behavior, let W be the random variable representing the presence of beneficial intestinal flora, let Y be the random variable representing the presence of stomach cancer, and let U be some unrestricted hidden variable which may influence both coprophagia and stomach cancer. A new patient at risk for stomach cancer comes in which is from the same cohort as those in our study. If we do not get any additional information on this patient, should we give them the coprophagic treatment as a preventative measure?
Solution: Yes, if and only if \sum{w} p(W = w | A = yes) \sum{a} E[Y = yes | W = w, A = a) p(A = a) < \sum{w} p(W = w | A = no) \sum{a} E[Y = yes | W = w, A = a) p(A = a)
Intuition for why this is correct: since W is independent of confounders for A and Y, and A only affects Y through W, the effect of A on Y decomposes/factorizes into an effect of A on W, and an effect of W on Y, averaged over possible values W could take. The effect of A on W is not confounded by anything, and so is equal to p(W | A). The effect of W on Y is confounder by A, but given our assumptions, conditioning on A is sufficient to remove all confounding for the effect, which gives us \sum{A} p(Y | W,A) p(A). This gives above formula.
Assumptions used: we need the empirical p(A,C,Y) from our study, and the assumption that the correct causal graph is A → W → Y, and there is an unrestricted hidden variable U that is a parent of both A and Y. No other assumptions needed.
Ideas here: sometimes your independences let you factorize effects into other effects, similarly to how Bayesian networks factorize. This lets you solve problems that might seem unsolvable due to the presence of unobserved confounding.
Ok, these problems I am posing are not abstract, they are concrete problems in medical decision making. In light of http://lesswrong.com/lw/jco/examples_in_mathematics/, I am going to pose 4 of them, right here, then tell you what the right answer is, and what assumptions I used to get this answer. Whatever decision theory you are using needs to be able to correctly represent and solve these problems, using at most the information that my solutions use, or it is not a very good decision theory (in the sense that there exist known alternatives that do solve these problems correctly). In all problems our utility penalizes patient deaths, or patients getting a disease.
In particular, if you are a user of EDT, you need to give me a concrete algorithm that correctly solves all these problems, without using any causal vocabulary. You can’t just beg off on how it’s a “non trivial problem.” These are decision problems, and there exist solutions for these problems now, using CDT! Can EDT solve them or not? I have yet to see anyone try to seriously engage these (with the notable exception of Paul, who to his credit did try to give a Bayesian/non-causal account of problem 3, but ran out of time).
Note : I am assuming the correct graph, and lots of samples, so the effect of a prior is not significant (and thus talk about empirical frequencies, e.g. p(c)). If we wanted to, we could do a Bayesian problem over possible causal graphs, with a prior, and/or a Bayesian problem for estimation, where we could talk about, for example, the posterior distribution of case histories C. I skipped all that to simplify the examples.
Problem 1:
We perform a randomized control trial for a drug (half the patients get the drug, half the patients do not). Some of the patients die (in both groups). Let A be a random variable representing whether the patient in our RCT dataset got the drug, and Y be a random variable representing whether the patient in our RCT dataset died. A new patient comes in which is from the same cohort as those in our RCT. Should we give them the drug?
Solution: Give the drug if and only if E[Y = yes | A = yes] < E[Y = yes | A = no].
Intuition for why this is correct: since we randomized the drug, there are no possible confounders between drug use and death. So any dependence between the drug and death is causal. So we can just look at conditional correlations.
Assumptions used: we need the empirical p(A,Y) from our RCT, and the assumption that the correct causal graph is A → Y. No other assumptions needed.
Ideas here: you should be able to transfer information you learn from observed members in a group to others members of the same group. Otherwise, what is stats even doing?
Problem 2:
We perform an observational study, where doctors assign (or not) a drug based on observed patient vitals recorded in their case history file. Some of the patients die. Let A be a random variable representing whether the patient in our dataset from our study got the drug, Y be a random variable representing whether the patient in our study died, and C be the random variable representing the patient vitals used by the doctors to decide whether to give the drug or not. A new patient comes in which is from the same cohort as those in our study. If we do not get any additional information on this patient, should we give them the drug?
Solution: Give the drug if and only if \sum{c} E[Y = yes | A = yes, c] p(c) < \sum{c} E[Y = yes | A = no, c] p(c)
Intuition for why this is correct: we have not randomized the drug, but we recorded all the info doctors used to decide on whether to give the drug. Since case history C represents all possible confounding between A and Y, conditional on knowing C, any dependence between A and Y is causal. In other words, E[Y | A, C] gives a causal dependence of A and Y, conditional on C. But since we are not allowed to measure anything about the incoming patient, we have to average over the possible case histories the patient might have. Since the patient is postulated to have come from the same dataset as those in our study, it is reasonable to average over the observed case histories in our study. This recovers the above formula.
Assumptions used: we need the empirical p(A,C,Y) from our study, and the assumption that the correct causal graph is C → A → Y, C → Y. No other assumptions needed.
Ideas here: this is isomorphic to the smoking lesion problem. The idea here is you can’t use observed correlations if there are confounders, you have to adjust for confounders properly using the g-formula (the formula in the answer).
Problem 3:
We perform a partially randomized and partially observational longitudinal study, where patients are randomly assigned (or not) a drug at time 0, then their vitals at time 1 are recorded in a file, and based on those vitals, and the treatment assignment history at time 0, doctors may (or not), decide to give them more of the drug. Afterwards, at time 2, some patients die (or not). Let A0 be a random variable representing whether the patient in our dataset from our study got the drug at time 0, A1 be a random variable representing whether the patient in our dataset from our study got the drug at time 1, Y be a random variable representing whether the patient in our study died, and C be the random variable representing the case history used by the doctors to decide whether to give the drug or not at time 1. A new patient comes in which is from the same cohort as those in our study. If we do not get any additional information on this patient, should we give them the drug, and if so at what time points?
Solution: Use the drug assignment policy (a0,a1) that minimizes \sum{c} E[Y = yes | A1 = a1, c, A0 = a0] p(c | A0 = a0).
Intuition for why this is correct: we have randomized A0, but have not randomized A1, and we are interested in the joint effect of both A0 and A1 on Y. We know C is a confounder for A1, so we have to adjust for it somehow as in Problem 2, otherwise an observed dependence of A1 and Y will contain a non-causal component through C. However, C is not a confounder for the relationship of A0 and Y. Conditional on A0 and C, the relationship between A1 and Y is entirely causal, so E[Y | A1, C, A0] is a causal quantity. However, for the incoming patient, we are not allowed to measure C, so we have to average over C as before in problem 2. However, in our case C is an effect of A0, which means we can’t just average the base rates for case histories, we have to take into account what happened at time 0, in other words the causal effect of A0 on C. Because in our graph, there are no confounders between A0 and C, the causal relationship can be represented by p(C | A0) (no confounders means correlation equals causation). Since A0 also has no confounders for Y, E[Y | A1, C, A0], weighted by p(C | A0) gives us the right causal relationship between {A0,A1} and Y.
Assumptions used: we need the empirical p(A0,C,A1,Y) from our study, and the assumption that the correct causal graph is A0 → C → A1 → Y, A0 → A1, A0 → Y, and we possibly allow that there is an unrestricted hidden variable U that is a parent of both C and Y. No other assumptions needed.
Ideas here: simply knowing that you have confounders is not enough, you have to pay attention to the precise causal relationships to figure out what the right thing to do is. In this case, C is a ‘time-varying confounder,’ and requires a more complicated adjustment that takes into account that the confounder is also an effect of an earlier treatment.
Problem 4:
We consider a (hypothetical) observational study of coprophagic treatment of stomach cancer. It is known (for the purposes of this hypothetical example) that coprophagia’s protective effect vs cancer is due to the presence of certain types of intestinal flora in feces. At the same time, people who engage in coprophagic behavior naturally are not a random sampling of the population, and therefore may be more likely than average to end up with stomach cancer. Let A be a random variable representing whether those in our study engaged in coprophagic behavior, let W be the random variable representing the presence of beneficial intestinal flora, let Y be the random variable representing the presence of stomach cancer, and let U be some unrestricted hidden variable which may influence both coprophagia and stomach cancer. A new patient at risk for stomach cancer comes in which is from the same cohort as those in our study. If we do not get any additional information on this patient, should we give them the coprophagic treatment as a preventative measure?
Solution: Yes, if and only if \sum{w} p(W = w | A = yes) \sum{a} E[Y = yes | W = w, A = a) p(A = a) < \sum{w} p(W = w | A = no) \sum{a} E[Y = yes | W = w, A = a) p(A = a)
Intuition for why this is correct: since W is independent of confounders for A and Y, and A only affects Y through W, the effect of A on Y decomposes/factorizes into an effect of A on W, and an effect of W on Y, averaged over possible values W could take. The effect of A on W is not confounded by anything, and so is equal to p(W | A). The effect of W on Y is confounder by A, but given our assumptions, conditioning on A is sufficient to remove all confounding for the effect, which gives us \sum{A} p(Y | W,A) p(A). This gives above formula.
Assumptions used: we need the empirical p(A,C,Y) from our study, and the assumption that the correct causal graph is A → W → Y, and there is an unrestricted hidden variable U that is a parent of both A and Y. No other assumptions needed.
Ideas here: sometimes your independences let you factorize effects into other effects, similarly to how Bayesian networks factorize. This lets you solve problems that might seem unsolvable due to the presence of unobserved confounding.