Mikhail Samin comments on Causal Diagrams and Causal Models

Mikhail Samin 25 Oct 2023 21:48 UTC
1 point
I think I spotted a mistake. I’m surprised no one noticed it earlier.
Eliezer says the data shows that Overweight and Internet both make exercise less likely.
That would imply that P(O|I & not E) should be less than P(O|not E); and that P(I|O & not E) should be less than P(I|not E). But actually, they’re approximately equal!^[1]
P(not E) 0.6039286373
P(not E & O) 0.1683130609
P(not E & I) 0.09519044713
P(I & O & not E) 0.02630682544
P(I|not E) 0.1576187007
P(I|not E & O) 0.1562969938
P(O|not E) 0.2786969362
P(O|not E & I) 0.2763599314
(From the current data, P(O|E & I) is less than P(O|E): 5% vs 10%; P(I|E & O) is less than P(I|E): 6% vs 11%; which is what you’d expect from Overweight and Internet both making exercise less likely, but it’s not all that O->E<-I implies.)
Edit: note that inferring the causal graph exactly the way it’s done in the alarms/recessions/burglars example fails here due to the extra conditional independences. The data is generated in a way that doesn’t show a correspondence to this graph if you follow a procedure identical to the one described in the post
1. ^
  google sheet with calculations
- DanielFilan 25 Nov 2023 19:15 UTC
  2 points
  Parent
  The data is generated in a way that doesn’t show a correspondence to this graph if you follow a procedure identical to the one described in the post
  When I calculate the conditional probability tables you get from the table of numbers in the post and multiply them out to get the joint distribution, my answers basically match with the table of numbers in the post.
  - Mikhail Samin 25 Nov 2023 19:20 UTC
    1 point
    Parent
    The point is, these probabilities don’t really correspond to that causal graph in a way described in the post. A script that simulates the causal graph: https://colab.research.google.com/drive/18pIMfKJpvlOZ213APeFrHNiqKiS5B5ve?usp=sharing
    - DanielFilan 26 Nov 2023 22:56 UTC
      2 points
      Parent
      I think they do correspond to the causal graph in the way described in the post. Your script simulates something more specific than the causal graph: you can fit the causal graph without being able to be fitted by the script.
      - Mikhail Samin 26 Nov 2023 23:11 UTC
        1 point
        Parent
        I think I don’t buy the story of a correct causal structure generating the data here in a way that supports the point of the post. If two variables, I and O, both make one value of E more likely than the other, that means the probability of I conditional on some value of E is different from the probability of I because I explains some of that value of E; but if you also know O, than this explains some of that value of E as well, and so P(I|E=x, O) should bd different.
        
        The post describes this example:
        
        This may seem a bit clearer by considering the scenario B->A<-E, where burglars and earthquakes both cause alarms. If we’re told the value of the bottom node, that there was an alarm, the probability of there being a burglar is not independent of whether we’re told there was an earthquake—the two top nodes are not conditionally independent once we condition on the bottom node
        
        And if you apply this procedure to “not exercising”, we don’t see that absence of conditional independence, once we condition on the bottom node. Which means that “not exercising” is not at all explained away by internet (or being overweight)
        DanielFilan 27 Nov 2023 19:32 UTC
        2 points
        Parent
        
        If two variables, I and O, both make one value of E more likely than the other, that means the probability of I conditional on some value of E is different from the probability of I because I explains some of that value of E
        
        That’s correct
        
        but if you also know O, than this explains some of that value of E as well, and so P(I|E=x, O) should bd different.
        
        This is generically true but doesn’t have to be true for every value of x.
        
        Here’s one way to see why the graph in the post is right: look at all other casual graphs, and you will see they either fail to imply that I and O are independent (as our graph does), or imply independences or conditional independences that don’t exist in the data.
        Mikhail Samin 28 Nov 2023 0:06 UTC
        1 point
        Parent
        I don’t really get how this can be true for some values of x but not others if the variable is binary
        DanielFilan 28 Nov 2023 6:32 UTC
        2 points
        Parent
        The post gives one example of how it can be true: the probabilities are compatible with the causal graph, I is independent of O given E = no, but I is not independent of O given E = yes.
        
        Here’s one way to see why the graph in the post is right: look at all other casual graphs, and you will see they either fail to imply that I and O are independent (as our graph does), or imply independences or conditional independences that don’t exist in the data.
        
        Have you tried this exercise?
- DanielFilan 25 Nov 2023 18:55 UTC
  2 points
  Parent
  
  Eliezer says the data shows that Overweight and Internet both make exercise less likely.
  
  I don’t think he actually says that? He just says they both causally affect exercise:
  
  Which says that weight and Internet use exert causal effects on exercise, but exercise doesn’t causally affect either.
  
  [EDIT: I’m totally, wrong, he does say “we now realize that being overweight and spending time on the Internet both cause you to exercise less”]
  
  Regarding your second point:
  
  That would imply that P(O|I & not E) should be less than P(O|not E); and that P(I|O & not E) should be less than P(I|not E).
  
  FWIW, I don’t take the sentence “overweight and internet both make exercise less likely” to imply that—just to imply that p(E | O) < p(E | not O) and p(E | I) < p(E | not I). The interaction terms could be complicated.
  - DanielFilan 25 Nov 2023 19:23 UTC
    2 points
    Parent
    It is weird that I is independent of O given not E, given that from that graph you wouldn’t expect conditional independence, but I is not independent of O given E, so that’s OK and consistent with the graph.

P(not E)	0.6039286373
P(not E & O)	0.1683130609
P(not E & I)	0.09519044713
P(I & O & not E)	0.02630682544
P(I\|not E)	0.1576187007
P(I\|not E & O)	0.1562969938
P(O\|not E)	0.2786969362
P(O\|not E & I)	0.2763599314