I think I spotted a mistake. I’m surprised no one noticed it earlier.
Eliezer says the data shows that Overweight and Internet both make exercise less likely.
That would imply that P(O|I & not E) should be less than P(O|not E); and that P(I|O & not E) should be less than P(I|not E). But actually, they’re approximately equal![1]
P(not E)
0.6039286373
P(not E & O)
0.1683130609
P(not E & I)
0.09519044713
P(I & O & not E)
0.02630682544
P(I|not E)
0.1576187007
P(I|not E & O)
0.1562969938
P(O|not E)
0.2786969362
P(O|not E & I)
0.2763599314
(From the current data, P(O|E & I) is less than P(O|E): 5% vs 10%; P(I|E & O) is less than P(I|E): 6% vs 11%; which is what you’d expect from Overweight and Internet both making exercise less likely, but it’s not all that O->E<-I implies.)
Edit: note that inferring the causal graph exactly the way it’s done in the alarms/recessions/burglars example fails here due to the extra conditional independences. The data is generated in a way that doesn’t show a correspondence to this graph if you follow a procedure identical to the one described in the post
The data is generated in a way that doesn’t show a correspondence to this graph if you follow a procedure identical to the one described in the post
When I calculate the conditional probability tables you get from the table of numbers in the post and multiply them out to get the joint distribution, my answers basically match with the table of numbers in the post.
I think they do correspond to the causal graph in the way described in the post. Your script simulates something more specific than the causal graph: you can fit the causal graph without being able to be fitted by the script.
I think I don’t buy the story of a correct causal structure generating the data here in a way that supports the point of the post. If two variables, I and O, both make one value of E more likely than the other, that means the probability of I conditional on some value of E is different from the probability of I because I explains some of that value of E; but if you also know O, than this explains some of that value of E as well, and so P(I|E=x, O) should bd different.
The post describes this example:
This may seem a bit clearer by considering the scenario B->A<-E, where burglars and earthquakes both cause alarms. If we’re told the value of the bottom node, that there was an alarm, the probability of there being a burglar is not independent of whether we’re told there was an earthquake—the two top nodes are not conditionally independent once we condition on the bottom node
And if you apply this procedure to “not exercising”, we don’t see that absence of conditional independence, once we condition on the bottom node. Which means that “not exercising” is not at all explained away by internet (or being overweight)
If two variables, I and O, both make one value of E more likely than the other, that means the probability of I conditional on some value of E is different from the probability of I because I explains some of that value of E
That’s correct
but if you also know O, than this explains some of that value of E as well, and so P(I|E=x, O) should bd different.
This is generically true but doesn’t have to be true for every value of x.
Here’s one way to see why the graph in the post is right: look at all other casual graphs, and you will see they either fail to imply that I and O are independent (as our graph does), or imply independences or conditional independences that don’t exist in the data.
The post gives one example of how it can be true: the probabilities are compatible with the causal graph, I is independent of O given E = no, but I is not independent of O given E = yes.
Here’s one way to see why the graph in the post is right: look at all other casual graphs, and you will see they either fail to imply that I and O are independent (as our graph does), or imply independences or conditional independences that don’t exist in the data.
Eliezer says the data shows that Overweight and Internet both make exercise less likely.
I don’t think he actually says that? He just says they both causally affect exercise:
Which says that weight and Internet use exert causal effects on exercise, but exercise doesn’t causally affect either.
[EDIT: I’m totally, wrong, he does say “we now realize that being overweight and spending time on the Internet both cause you to exercise less”]
Regarding your second point:
That would imply that P(O|I & not E) should be less than P(O|not E); and that P(I|O & not E) should be less than P(I|not E).
FWIW, I don’t take the sentence “overweight and internet both make exercise less likely” to imply that—just to imply that p(E | O) < p(E | not O) and p(E | I) < p(E | not I). The interaction terms could be complicated.
It is weird that I is independent of O given not E, given that from that graph you wouldn’t expect conditional independence, but I is not independent of O given E, so that’s OK and consistent with the graph.
I think I spotted a mistake. I’m surprised no one noticed it earlier.
Eliezer says the data shows that Overweight and Internet both make exercise less likely.
That would imply that P(O|I & not E) should be less than P(O|not E); and that P(I|O & not E) should be less than P(I|not E). But actually, they’re approximately equal![1]
(From the current data, P(O|E & I) is less than P(O|E): 5% vs 10%; P(I|E & O) is less than P(I|E): 6% vs 11%; which is what you’d expect from Overweight and Internet both making exercise less likely, but it’s not all that O->E<-I implies.)
Edit: note that inferring the causal graph exactly the way it’s done in the alarms/recessions/burglars example fails here due to the extra conditional independences. The data is generated in a way that doesn’t show a correspondence to this graph if you follow a procedure identical to the one described in the post
google sheet with calculations
When I calculate the conditional probability tables you get from the table of numbers in the post and multiply them out to get the joint distribution, my answers basically match with the table of numbers in the post.
The point is, these probabilities don’t really correspond to that causal graph in a way described in the post. A script that simulates the causal graph: https://colab.research.google.com/drive/18pIMfKJpvlOZ213APeFrHNiqKiS5B5ve?usp=sharing
I think they do correspond to the causal graph in the way described in the post. Your script simulates something more specific than the causal graph: you can fit the causal graph without being able to be fitted by the script.
I think I don’t buy the story of a correct causal structure generating the data here in a way that supports the point of the post. If two variables, I and O, both make one value of E more likely than the other, that means the probability of I conditional on some value of E is different from the probability of I because I explains some of that value of E; but if you also know O, than this explains some of that value of E as well, and so P(I|E=x, O) should bd different.
The post describes this example:
And if you apply this procedure to “not exercising”, we don’t see that absence of conditional independence, once we condition on the bottom node. Which means that “not exercising” is not at all explained away by internet (or being overweight)
That’s correct
This is generically true but doesn’t have to be true for every value of x.
Here’s one way to see why the graph in the post is right: look at all other casual graphs, and you will see they either fail to imply that I and O are independent (as our graph does), or imply independences or conditional independences that don’t exist in the data.
I don’t really get how this can be true for some values of x but not others if the variable is binary
The post gives one example of how it can be true: the probabilities are compatible with the causal graph, I is independent of O given E = no, but I is not independent of O given E = yes.
Have you tried this exercise?
I don’t think he actually says that? He just says they both causally affect exercise:
[EDIT: I’m totally, wrong, he does say “we now realize that being overweight and spending time on the Internet both cause you to exercise less”]
Regarding your second point:
FWIW, I don’t take the sentence “overweight and internet both make exercise less likely” to imply that—just to imply that p(E | O) < p(E | not O) and p(E | I) < p(E | not I). The interaction terms could be complicated.
It is weird that I is independent of O given not E, given that from that graph you wouldn’t expect conditional independence, but I is not independent of O given E, so that’s OK and consistent with the graph.