p(a | do(b)) = p(a) if b is not an ancestor of a in a causal graph.
p(a | do(b)) = sum{pa(b)} p(a | b, pa(b)) p(pa(b)) if b is an ancestor of a in a causal DAG (pa(b) are the parents/direct causes of b in same). The idea is p(b | pa(b)) represents how b varies based on its direct causes pa(b). An intervention do(b) tells b to ignore its causes and become just a value we set. So we drop out p(b | pa(b)) from the factorization, and marginalize everything except b out. This is called “truncated factorization” or “g-formula.”
If your causal DAG has hidden variables, there is sometimes no way to express p(a | do(b)) as a function of the observed marginal, and sometimes there is. You can read my thesis, or Judea’s book for details if you are curious. For example if your causal DAG is:
b → c → a with a hidden common cause h of b and a, then
If you forget about causality, and view the g-formula rules above as a statistical calculus, you get something interesting, but that’s a separate story :).
It doesn’t look to me like you’re doing EDT with a causal model. It looks to me like you’re redefining | so that CDT is expressed with the symbols normally used to accept EDT.
The notation p(a | do(b)) is due to Pearl, and it does redefine what the conditioning bar means, although the notation is not really ambiguous.(*) You can also do things like p(a | do(b), c) = p(a,c | do(b)) / p(c | do(b)). Lauritzen writes p(a | do(b)) as p(a || b). Robins writes p(a | do(b)) as p(a | g = b) (actually Robins was first, so it’s more fair to say Pearl writes the latter as the former). The potential outcome people write p(a | do(b)) as p(A_b = a) or p(A(b) = a).
The point is, do(.) and conditioning aren’t the same.
(*) The problem with the do(.) notation is you cannot express things like p(A(b) | B = b’), which is known in some circles as “the effect of treatment on the (un)treated,” and more general kinds of counterfactuals, but this is a discussion for another time. I prefer the potential outcome notation myself.
What do you mean by “busted”? It lets you get $1,000,000 in Newcomb’s problem, which is $999,000 more than CDT gets you.
From past comments on the subject by this user it roughly translates to “CDT is rational. We evaluate decision theories based on whether they are rational. EDT does not produce the same results as CDT therefore EDT is busted.”
From past comments on the subject by this user it roughly translates to “CDT is rational. We evaluate decision
theories based on whether they are rational. EDT does not produce the same results as CDT therefore EDT is
busted.”
If this is what you got from my comments on EDT and CDT, you really haven’t been paying attention.
p(a | do(b)) = p(a) if b is not an ancestor of a in a causal graph.
p(a | do(b)) = sum{pa(b)} p(a | b, pa(b)) p(pa(b)) if b is an ancestor of a in a causal DAG (pa(b) are the parents/direct causes of b in same). The idea is p(b | pa(b)) represents how b varies based on its direct causes pa(b). An intervention do(b) tells b to ignore its causes and become just a value we set. So we drop out p(b | pa(b)) from the factorization, and marginalize everything except b out. This is called “truncated factorization” or “g-formula.”
If your causal DAG has hidden variables, there is sometimes no way to express p(a | do(b)) as a function of the observed marginal, and sometimes there is. You can read my thesis, or Judea’s book for details if you are curious. For example if your causal DAG is:
b → c → a with a hidden common cause h of b and a, then
p(a | do(b)) = sum{c} p(c | b) sum{b’} p(a | c, b’) p(b’)
If you forget about causality, and view the g-formula rules above as a statistical calculus, you get something interesting, but that’s a separate story :).
What is pa(X)?
It doesn’t look to me like you’re doing EDT with a causal model. It looks to me like you’re redefining | so that CDT is expressed with the symbols normally used to accept EDT.
I am doing CDT. I wouldn’t dream of doing EDT because EDT is busted :).
In the wikipedia article on CDT:
http://en.wikipedia.org/wiki/Causal_decision_theory
p(A > Oj) is referring to p(Oj | do(A)).
The notation p(a | do(b)) is due to Pearl, and it does redefine what the conditioning bar means, although the notation is not really ambiguous.(*) You can also do things like p(a | do(b), c) = p(a,c | do(b)) / p(c | do(b)). Lauritzen writes p(a | do(b)) as p(a || b). Robins writes p(a | do(b)) as p(a | g = b) (actually Robins was first, so it’s more fair to say Pearl writes the latter as the former). The potential outcome people write p(a | do(b)) as p(A_b = a) or p(A(b) = a).
The point is, do(.) and conditioning aren’t the same.
(*) The problem with the do(.) notation is you cannot express things like p(A(b) | B = b’), which is known in some circles as “the effect of treatment on the (un)treated,” and more general kinds of counterfactuals, but this is a discussion for another time. I prefer the potential outcome notation myself.
The OP implied that EDT becomes CDT if a certain model is used.
What do you mean by “busted”? It lets you get $1,000,000 in Newcomb’s problem, which is $999,000 more than CDT gets you.
Yes. I think the OP is “wrong.” Or rather, the OP makes the distinction between EDT and CDT meaningless.
I mean that it doesn’t work properly, much like a stopped clock.
Wasn’t the OP saying that there wasn’t a distinction between EDT and CDT?
If you want to get money when you encounter Newcomb’s problem, you get more if you use EDT than CDT. Doesn’t this imply that EDT works better?
Sure, in the same sense that a stopped clock pointing to 12 is better than a running clock that is five minutes fast, when it is midnight.
From past comments on the subject by this user it roughly translates to “CDT is rational. We evaluate decision theories based on whether they are rational. EDT does not produce the same results as CDT therefore EDT is busted.”
“Busted” = “does the wrong thing.”
If this is what you got from my comments on EDT and CDT, you really haven’t been paying attention.