The hope that no correlation implies no causation is referred to as the “causal faithfulness assumption”.
While the faithfulness assumption is plausible in many circumstances, there are circumstances in which it is invalid.
Cooper discusses out Deterministic relationships and Goal-oriented systems as two examples where it is invalid.
I think that causal discovery literature is aware of Milton Friedman’s thermostat and knows it by the name “Failure of causal faithfulness in goal oriented systems”
That post is slow to reach its point and kind of abrasive. Here’s summary with a different flavor.
Output is set by some Stuff and a Control signal. Agent with full power over Control and accurate models of Output and Stuff can negate the influence of Stuff, making Output whatever it wants, within the range of possible Outputs given Stuff. Intuitively Agent is setting Output via Control, even though there won’t be a correlation if Agent is keeping Output constant. I’m not so sure whether it still makes sense to say, even intuitively, that Stuff is a causal parent of Output when the agent trumps it.
Then we break the situation a little. Suppose a driver is keeping a car’s speed constant with a gas pedal. You can make the Agent’s beliefs inaccurate (directly, by showing the driver a video of an upcoming hill when there is none in front of the car, or by intervening on Stuff, like introducing a gust of wind the driver can’t see, and then just not updating Agent’s belief). Likewise you can make Agent partially impotent (push down the driver’s leg on the gas pedal, give them a seizure, replace them with an octopus). Finally you can change what apparent values and causal relations the agent wants to enforce (“Please go faster”).
And those are maybe how you test for consequentialist confounding in real life? You can set environment variables if the agent doesn’t anticipate you, or you can find that agent and make them beleive noise, break their grasp on your precious variables, or change their desires.
“Milton Friedman’s thermostat” is an excellent article (although most of the comments are clueless). But some things about it bear emphasising.
Output is set by some Stuff and a Control signal.
Yes.
Agent with full power over Control and accurate models of Output and Stuff can negate the influence of Stuff
No. Control systems do not work like that.
All the Agent needs to know is how to vary the Output to bring the thing to be controlled towards its desired value. It need not even be aware of any of the Stuff. It might or might not be helpful, but it is not necessary. The room thermostat does not: it simply turns the heating on when the temperature is below the set point and off when it is above. It neither knows nor cares what the ambient temperature is outside, whether the sun is shining on the building, how many people are in the room, or anything at all except the sensed temperature and the reference temperature.
You can make the Agent’s beliefs inaccurate (directly, by showing the driver a video of an upcoming hill when there is none in front of the car, or by intervening on Stuff, like introducing a gust of wind the driver can’t see, and then just not updating Agent’s belief).
If you try to keep the speed of your car constant by deliberately compensating for the disturbances you can see, you will do a poor job of it. The Agent does not need to anticipate hills, and wind is invisible from inside a car. Instead all you have to do—and all that an automatic cruise control does—is measure the actual speed, compare it with the speed you want, and vary the accelerator pedal accordingly. The cruise control does not sense the gradient, head winds, tail winds, a dragging brake, or the number of passengers in the car. It doesn’t need to. All it needs to do is sense the actual and desired speeds, and know how to vary the flow of fuel to bring the former closer to the latter. A simple PID controller is enough to do that.
This concept is absolutely fundamental to control systems. The controller can function, and function well, while knowing almost nothing. While you can design control systems that do—or attempt to do—the things you mention, sensing disturbances and computing the outputs required to counteract them, none of that is a prerequisite for control. Most control systems do without such refinements.
I’m familiar with feedback control and I’ve used PID controlers in the design of servo-hydraulic systems. That wasn’t the situation the blog post described. If you have delays, or hysteresis, or any other reason for a non-zero impulse response, you lose the vanishing correlations which made the problem interesting.
Good point. And here’s a made-up parallel example to that about weight/exercise:
Suppose level of exercise can influence weight (E → W), and that being underfed reduces weight (U->W) directly but will also reduce the amount of exercise people do (U->E) by an amount where the effect of the reduced exercise on weight exactly cancels out the direct weight reduction.
Suppose also there is no random variation in amount of exercise, so it’s purely a function of being underfed.
If we look at data generated in that situation, we would find no correlation between exercise and weight. Examining only those two variables we might assume no causal relation.
Adding in the third variable, would find a perfect correlation between (lack of) exercise and underfeeding. Implications of finding this perfect correlation: you can’t tell if the causal relation between them should be E->U or U->E. And even if you somehow know the graph is (E->W), (U->E) and (E->W), there is no data on what happens to W for an underfed person who exercise, or a well-fed person who doesn’t exercise, so you can’t predict the effect of modifying E.
but will also reduce the amount of exercise people do (U->E) by an amount where the effect of the reduced exercise on weight exactly cancels out the direct weight reduction.
It’s unlikely that two effects will randomly cancel out unless the situation is the result of some optimizing process. This is the case in Milton Friedman’s thermostat but doesn’t appear to be the case in your example.
It wouldn’t be random. It would be an optimising process, tuned by evolution (another well known optimising process). If you have less food than needed to maintain your current weight, expend less energy (on activities other than trying to find more food). For most of our evolution, losing weight was a personal existential risk.
I had meant to suggest some sort of unintelligent feedback system. Not coincidence, but also not an intelligent optimisation, so still not an exact parallel to his thermostat.
Studies can always have confounding factors, of course. And I wrote “falsification” but could have more accurately said something about reducing the posterior probability. Lack of correlation (e.g. with speed) would sharply reduce the p.p. of a simple model with one input (e.g. gas pedal), but only reduce the p.p. of a model with multiple inputs (e.g. gas pedal + hilly terrain) to a weaker extent.
What about Milton Friedman’s thermostat?
Computation, Causation, & Discovery starts with an overview chapter provided by Gregory Cooper
The hope that no correlation implies no causation is referred to as the “causal faithfulness assumption”.
Cooper discusses out Deterministic relationships and Goal-oriented systems as two examples where it is invalid.
I think that causal discovery literature is aware of Milton Friedman’s thermostat and knows it by the name “Failure of causal faithfulness in goal oriented systems”
That post is slow to reach its point and kind of abrasive. Here’s summary with a different flavor.
Output is set by some Stuff and a Control signal. Agent with full power over Control and accurate models of Output and Stuff can negate the influence of Stuff, making Output whatever it wants, within the range of possible Outputs given Stuff. Intuitively Agent is setting Output via Control, even though there won’t be a correlation if Agent is keeping Output constant. I’m not so sure whether it still makes sense to say, even intuitively, that Stuff is a causal parent of Output when the agent trumps it.
Then we break the situation a little. Suppose a driver is keeping a car’s speed constant with a gas pedal. You can make the Agent’s beliefs inaccurate (directly, by showing the driver a video of an upcoming hill when there is none in front of the car, or by intervening on Stuff, like introducing a gust of wind the driver can’t see, and then just not updating Agent’s belief). Likewise you can make Agent partially impotent (push down the driver’s leg on the gas pedal, give them a seizure, replace them with an octopus). Finally you can change what apparent values and causal relations the agent wants to enforce (“Please go faster”).
And those are maybe how you test for consequentialist confounding in real life? You can set environment variables if the agent doesn’t anticipate you, or you can find that agent and make them beleive noise, break their grasp on your precious variables, or change their desires.
“Milton Friedman’s thermostat” is an excellent article (although most of the comments are clueless). But some things about it bear emphasising.
Yes.
No. Control systems do not work like that.
All the Agent needs to know is how to vary the Output to bring the thing to be controlled towards its desired value. It need not even be aware of any of the Stuff. It might or might not be helpful, but it is not necessary. The room thermostat does not: it simply turns the heating on when the temperature is below the set point and off when it is above. It neither knows nor cares what the ambient temperature is outside, whether the sun is shining on the building, how many people are in the room, or anything at all except the sensed temperature and the reference temperature.
If you try to keep the speed of your car constant by deliberately compensating for the disturbances you can see, you will do a poor job of it. The Agent does not need to anticipate hills, and wind is invisible from inside a car. Instead all you have to do—and all that an automatic cruise control does—is measure the actual speed, compare it with the speed you want, and vary the accelerator pedal accordingly. The cruise control does not sense the gradient, head winds, tail winds, a dragging brake, or the number of passengers in the car. It doesn’t need to. All it needs to do is sense the actual and desired speeds, and know how to vary the flow of fuel to bring the former closer to the latter. A simple PID controller is enough to do that.
This concept is absolutely fundamental to control systems. The controller can function, and function well, while knowing almost nothing. While you can design control systems that do—or attempt to do—the things you mention, sensing disturbances and computing the outputs required to counteract them, none of that is a prerequisite for control. Most control systems do without such refinements.
I’m familiar with feedback control and I’ve used PID controlers in the design of servo-hydraulic systems. That wasn’t the situation the blog post described. If you have delays, or hysteresis, or any other reason for a non-zero impulse response, you lose the vanishing correlations which made the problem interesting.
There are two issues with it.
You can not figure out how something works by only looking at some aspect. Think of the blind people and elephant story.
But it still has a point because with a subsystem that makes predictions the understanding of a system by pure observation becomes impossible.
Good point. And here’s a made-up parallel example to that about weight/exercise:
Suppose level of exercise can influence weight (E → W), and that being underfed reduces weight (U->W) directly but will also reduce the amount of exercise people do (U->E) by an amount where the effect of the reduced exercise on weight exactly cancels out the direct weight reduction.
Suppose also there is no random variation in amount of exercise, so it’s purely a function of being underfed.
If we look at data generated in that situation, we would find no correlation between exercise and weight. Examining only those two variables we might assume no causal relation.
Adding in the third variable, would find a perfect correlation between (lack of) exercise and underfeeding. Implications of finding this perfect correlation: you can’t tell if the causal relation between them should be E->U or U->E. And even if you somehow know the graph is (E->W), (U->E) and (E->W), there is no data on what happens to W for an underfed person who exercise, or a well-fed person who doesn’t exercise, so you can’t predict the effect of modifying E.
It’s unlikely that two effects will randomly cancel out unless the situation is the result of some optimizing process. This is the case in Milton Friedman’s thermostat but doesn’t appear to be the case in your example.
It wouldn’t be random. It would be an optimising process, tuned by evolution (another well known optimising process). If you have less food than needed to maintain your current weight, expend less energy (on activities other than trying to find more food). For most of our evolution, losing weight was a personal existential risk.
I had meant to suggest some sort of unintelligent feedback system. Not coincidence, but also not an intelligent optimisation, so still not an exact parallel to his thermostat.
The thermostat was created by an intelligent human.
I never said the optimizing process had to be that intelligent, i.e., the blind-idiot-god counts.
Studies can always have confounding factors, of course. And I wrote “falsification” but could have more accurately said something about reducing the posterior probability. Lack of correlation (e.g. with speed) would sharply reduce the p.p. of a simple model with one input (e.g. gas pedal), but only reduce the p.p. of a model with multiple inputs (e.g. gas pedal + hilly terrain) to a weaker extent.
By the way, you can still learn structure from data in the presence of unobserved confounders. The problem becomes very interesting indeed, then.
Oh, awesome. Can you provide a link / reference / name of what I should Google?
http://www.hss.cmu.edu/philosophy/spirtes/n-oracle.ps
(FCI algorithm)
http://www.cs.huji.ac.il/~nir/Abstracts/Fr2.html
(structural EM)
http://www.stat.washington.edu/tsr/uai-causal-structure-learning-workshop/
(look for “parameter and structure learning in nested Markov models”)