That’s right, but as I said, you cannot just condition on L0 because that blocks the causal path from A0 to Y, and opens a non-causal path A0 → L0 <-> Y.
Okay, this isn’t actually a problem. At A1 (deciding whether to give HAART at time t=1) you condition on L0 because you’ve observed it. This means using P(outcome=Y | action=give-haart-at-A1, observations=[L0, the dataset]) which happens to be identical to P(outcome=Y | do(action=give-haart-at-A1), observations=[L0, the dataset]), since A1 has no parents apart from L0. So the decision is the same as CDT at A1.
At A0 (deciding whether to give HAART at time t=0), you haven’t measured L0, so you don’t condition on it. You use P(outcome=Y | action=give-haart-at-A0, observations=[the dataset]) which happens to be the same as P(outcome=Y | do(action=give-haart-at-A0), observations=[the dataset]) since A0 has no parents at all. The decision is the same as CDT at A0, as well.
To make this perfectly clear, what I am doing here is replacing the agents at A0 and A1 (that decide whether to administer HAART) with EDT agents with access to the aforementioned dataset and calculating what they would do. That is, “You are at A0. Decide whether to administer HAART using EDT.” and “You are at A1. You have observed L0=[...]. Decide whether to administer HAART using EDT.”. The decisions about what to do at A0 and A1 are calculated separately (though the agent at A0 will generally need to know, and therefore to first calculate what A1 will do, so that they can calculate stuff like P(outcome=Y | action=give-haart-at-A0, observations=[the dataset])).
You may actually be thinking of “solve this problem using EDT” as “using EDT, derive the best (conditional) policy for agents at A0 and A1″, which means an EDT agent standing “outside the problem”, deciding upon what A0 and A1 should do ahead of time, which works somewhat differently — happily, though, it’s practically trivial to show that this EDT agent’s decision would be the same as CDT’s: because an agent deciding on a policy for A0 and A1 ahead of time is affected by nothing except the original dataset, which is of course the input (an observation), we have P(outcome | do(policy), observations=dataset) = P(outcome | policy, observations=dataset). In case it’s not obvious, the graph for this case is dataset -> (agent chooses policy) -> (some number of people die after assigning A0,A1 based on policy) -> outcome.
Okay, this isn’t actually a problem. At A1 (deciding whether to give HAART at time t=1) you condition on L0 because you’ve observed it. This means using
P(outcome=Y | action=give-haart-at-A1, observations=[L0, the dataset])
which happens to be identical toP(outcome=Y | do(action=give-haart-at-A1), observations=[L0, the dataset])
, since A1 has no parents apart from L0. So the decision is the same as CDT at A1.At A0 (deciding whether to give HAART at time t=0), you haven’t measured L0, so you don’t condition on it. You use
P(outcome=Y | action=give-haart-at-A0, observations=[the dataset])
which happens to be the same asP(outcome=Y | do(action=give-haart-at-A0), observations=[the dataset])
since A0 has no parents at all. The decision is the same as CDT at A0, as well.To make this perfectly clear, what I am doing here is replacing the agents at A0 and A1 (that decide whether to administer HAART) with EDT agents with access to the aforementioned dataset and calculating what they would do. That is, “You are at A0. Decide whether to administer HAART using EDT.” and “You are at A1. You have observed L0=[...]. Decide whether to administer HAART using EDT.”. The decisions about what to do at A0 and A1 are calculated separately (though the agent at A0 will generally need to know, and therefore to first calculate what A1 will do, so that they can calculate stuff like
P(outcome=Y | action=give-haart-at-A0, observations=[the dataset])
).You may actually be thinking of “solve this problem using EDT” as “using EDT, derive the best (conditional) policy for agents at A0 and A1″, which means an EDT agent standing “outside the problem”, deciding upon what A0 and A1 should do ahead of time, which works somewhat differently — happily, though, it’s practically trivial to show that this EDT agent’s decision would be the same as CDT’s: because an agent deciding on a policy for A0 and A1 ahead of time is affected by nothing except the original dataset, which is of course the input (an observation), we have
P(outcome | do(policy), observations=dataset) = P(outcome | policy, observations=dataset)
. In case it’s not obvious, the graph for this case isdataset -> (agent chooses policy) -> (some number of people die after assigning A0,A1 based on policy) -> outcome
.