IlyaShpitser comments on Random variables and Evidential Decision Theory

IlyaShpitser 9 Aug 2013 18:00 UTC
4 points
p(a | do(b)) = p(a) if b is not an ancestor of a in a causal graph.

p(a | do(b)) = sum{pa(b)} p(a | b, pa(b)) p(pa(b)) if b is an ancestor of a in a causal DAG (pa(b) are the parents/direct causes of b in same). The idea is p(b | pa(b)) represents how b varies based on its direct causes pa(b). An intervention do(b) tells b to ignore its causes and become just a value we set. So we drop out p(b | pa(b)) from the factorization, and marginalize everything except b out. This is called “truncated factorization” or “g-formula.”

If your causal DAG has hidden variables, there is sometimes no way to express p(a | do(b)) as a function of the observed marginal, and sometimes there is. You can read my thesis, or Judea’s book for details if you are curious. For example if your causal DAG is:

b → c → a with a hidden common cause h of b and a, then

p(a | do(b)) = sum{c} p(c | b) sum{b’} p(a | c, b’) p(b’)

If you forget about causality, and view the g-formula rules above as a statistical calculus, you get something interesting, but that’s a separate story :).
- DanielLC 9 Aug 2013 18:25 UTC
  1 point
  Parent
  What is pa(X)?
  
  It doesn’t look to me like you’re doing EDT with a causal model. It looks to me like you’re redefining | so that CDT is expressed with the symbols normally used to accept EDT.
  - IlyaShpitser 9 Aug 2013 18:37 UTC
    2 points
    Parent
    
    (pa(b) are the parents/direct causes of b in same)
    
    I am doing CDT. I wouldn’t dream of doing EDT because EDT is busted :).
    
    In the wikipedia article on CDT:
    
    http://en.wikipedia.org/wiki/Causal_decision_theory
    
    p(A > Oj) is referring to p(Oj | do(A)).
    
    The notation p(a | do(b)) is due to Pearl, and it does redefine what the conditioning bar means, although the notation is not really ambiguous.(*) You can also do things like p(a | do(b), c) = p(a,c | do(b)) / p(c | do(b)). Lauritzen writes p(a | do(b)) as p(a || b). Robins writes p(a | do(b)) as p(a | g = b) (actually Robins was first, so it’s more fair to say Pearl writes the latter as the former). The potential outcome people write p(a | do(b)) as p(A_b = a) or p(A(b) = a).
    
    The point is, do(.) and conditioning aren’t the same.
    
    (*) The problem with the do(.) notation is you cannot express things like p(A(b) | B = b’), which is known in some circles as “the effect of treatment on the (un)treated,” and more general kinds of counterfactuals, but this is a discussion for another time. I prefer the potential outcome notation myself.
    - DanielLC 9 Aug 2013 21:37 UTC
      0 points
      Parent
      
      I am doing CDT.
      
      The OP implied that EDT becomes CDT if a certain model is used.
      
      I wouldn’t dream of doing EDT because EDT is busted :).
      
      What do you mean by “busted”? It lets you get $1,000,000 in Newcomb’s problem, which is $999,000 more than CDT gets you.
      - IlyaShpitser 9 Aug 2013 21:49 UTC
        2 points
        Parent
        
        The OP implied that EDT becomes CDT if a certain model is used.
        
        Yes. I think the OP is “wrong.” Or rather, the OP makes the distinction between EDT and CDT meaningless.
        
        What do you mean by “busted”?
        
        I mean that it doesn’t work properly, much like a stopped clock.
        DanielLC 10 Aug 2013 1:17 UTC
        0 points
        Parent
        
        Yes. I think the OP is “wrong.” Or rather, the OP makes the distinction between EDT and CDT meaningless.
        
        Wasn’t the OP saying that there wasn’t a distinction between EDT and CDT?
        
        I mean that it doesn’t work properly, much like a stopped clock.
        
        If you want to get money when you encounter Newcomb’s problem, you get more if you use EDT than CDT. Doesn’t this imply that EDT works better?
        IlyaShpitser 10 Aug 2013 1:34 UTC
        3 points
        Parent
        Sure, in the same sense that a stopped clock pointing to 12 is better than a running clock that is five minutes fast, when it is midnight.
      - wedrifid 10 Aug 2013 2:36 UTC
        −4 points
        Parent
        
        What do you mean by “busted”? It lets you get $1,000,000 in Newcomb’s problem, which is $999,000 more than CDT gets you.
        
        From past comments on the subject by this user it roughly translates to “CDT is rational. We evaluate decision theories based on whether they are rational. EDT does not produce the same results as CDT therefore EDT is busted.”
        IlyaShpitser 10 Aug 2013 3:17 UTC
        2 points
        Parent
        “Busted” = “does the wrong thing.”
        
        From past comments on the subject by this user it roughly translates to “CDT is rational. We evaluate decision theories based on whether they are rational. EDT does not produce the same results as CDT therefore EDT is busted.”
        
        If this is what you got from my comments on EDT and CDT, you really haven’t been paying attention.