selylindi comments on Causal Diagrams and Causal Models

selylindi 12 Oct 2012 16:37 UTC
33 points
0
(summary)

Correlation does not imply causation,

but

causation implies correlation,

and therefore

no correlation implies no causation

...which permits the falsification of some causal theories based on the absence of certain correlations.
- teageegeepea 12 Oct 2012 17:49 UTC
  21 points
  0
  Parent
  What about Milton Friedman’s thermostat?
  - AlanCrowe 12 Oct 2012 20:58 UTC
    9 points
    0
    Parent
    Computation, Causation, & Discovery starts with an overview chapter provided by Gregory Cooper
    
    The hope that no correlation implies no causation is referred to as the “causal faithfulness assumption”.
    
    While the faithfulness assumption is plausible in many circumstances, there are circumstances in which it is invalid.
    
    Cooper discusses out Deterministic relationships and Goal-oriented systems as two examples where it is invalid.
    
    I think that causal discovery literature is aware of Milton Friedman’s thermostat and knows it by the name “Failure of causal faithfulness in goal oriented systems”
  - [deleted] 15 Oct 2012 9:18 UTC
    2 points
    0
    Parent
    That post is slow to reach its point and kind of abrasive. Here’s summary with a different flavor.
    
    Output is set by some Stuff and a Control signal. Agent with full power over Control and accurate models of Output and Stuff can negate the influence of Stuff, making Output whatever it wants, within the range of possible Outputs given Stuff. Intuitively Agent is setting Output via Control, even though there won’t be a correlation if Agent is keeping Output constant. I’m not so sure whether it still makes sense to say, even intuitively, that Stuff is a causal parent of Output when the agent trumps it.
    
    Then we break the situation a little. Suppose a driver is keeping a car’s speed constant with a gas pedal. You can make the Agent’s beliefs inaccurate (directly, by showing the driver a video of an upcoming hill when there is none in front of the car, or by intervening on Stuff, like introducing a gust of wind the driver can’t see, and then just not updating Agent’s belief). Likewise you can make Agent partially impotent (push down the driver’s leg on the gas pedal, give them a seizure, replace them with an octopus). Finally you can change what apparent values and causal relations the agent wants to enforce (“Please go faster”).
    
    And those are maybe how you test for consequentialist confounding in real life? You can set environment variables if the agent doesn’t anticipate you, or you can find that agent and make them beleive noise, break their grasp on your precious variables, or change their desires.
    - Richard_Kennaway 15 Oct 2012 10:40 UTC
      3 points
      0
      Parent
      “Milton Friedman’s thermostat” is an excellent article (although most of the comments are clueless). But some things about it bear emphasising.
      
      Output is set by some Stuff and a Control signal.
      
      Yes.
      
      Agent with full power over Control and accurate models of Output and Stuff can negate the influence of Stuff
      
      No. Control systems do not work like that.
      
      All the Agent needs to know is how to vary the Output to bring the thing to be controlled towards its desired value. It need not even be aware of any of the Stuff. It might or might not be helpful, but it is not necessary. The room thermostat does not: it simply turns the heating on when the temperature is below the set point and off when it is above. It neither knows nor cares what the ambient temperature is outside, whether the sun is shining on the building, how many people are in the room, or anything at all except the sensed temperature and the reference temperature.
      
      You can make the Agent’s beliefs inaccurate (directly, by showing the driver a video of an upcoming hill when there is none in front of the car, or by intervening on Stuff, like introducing a gust of wind the driver can’t see, and then just not updating Agent’s belief).
      
      If you try to keep the speed of your car constant by deliberately compensating for the disturbances you can see, you will do a poor job of it. The Agent does not need to anticipate hills, and wind is invisible from inside a car. Instead all you have to do—and all that an automatic cruise control does—is measure the actual speed, compare it with the speed you want, and vary the accelerator pedal accordingly. The cruise control does not sense the gradient, head winds, tail winds, a dragging brake, or the number of passengers in the car. It doesn’t need to. All it needs to do is sense the actual and desired speeds, and know how to vary the flow of fuel to bring the former closer to the latter. A simple PID controller is enough to do that.
      
      This concept is absolutely fundamental to control systems. The controller can function, and function well, while knowing almost nothing. While you can design control systems that do—or attempt to do—the things you mention, sensing disturbances and computing the outputs required to counteract them, none of that is a prerequisite for control. Most control systems do without such refinements.
      - [deleted] 15 Oct 2012 15:38 UTC
        1 point
        0
        Parent
        I’m familiar with feedback control and I’ve used PID controlers in the design of servo-hydraulic systems. That wasn’t the situation the blog post described. If you have delays, or hysteresis, or any other reason for a non-zero impulse response, you lose the vanishing correlations which made the problem interesting.
  - MoritzG 12 Dec 2019 18:53 UTC
    1 point
    0
    Parent
    There are two issues with it.
    You can not figure out how something works by only looking at some aspect. Think of the blind people and elephant story.
    But it still has a point because with a subsystem that makes predictions the understanding of a system by pure observation becomes impossible.
  - Caspian 14 Oct 2012 6:11 UTC
    0 points
    0
    Parent
    Good point. And here’s a made-up parallel example to that about weight/exercise:
    
    Suppose level of exercise can influence weight (E → W), and that being underfed reduces weight (U->W) directly but will also reduce the amount of exercise people do (U->E) by an amount where the effect of the reduced exercise on weight exactly cancels out the direct weight reduction.
    
    Suppose also there is no random variation in amount of exercise, so it’s purely a function of being underfed.
    
    If we look at data generated in that situation, we would find no correlation between exercise and weight. Examining only those two variables we might assume no causal relation.
    
    Adding in the third variable, would find a perfect correlation between (lack of) exercise and underfeeding. Implications of finding this perfect correlation: you can’t tell if the causal relation between them should be E->U or U->E. And even if you somehow know the graph is (E->W), (U->E) and (E->W), there is no data on what happens to W for an underfed person who exercise, or a well-fed person who doesn’t exercise, so you can’t predict the effect of modifying E.
    - Eugine_Nier 14 Oct 2012 18:15 UTC
      1 point
      0
      Parent
      
      but will also reduce the amount of exercise people do (U->E) by an amount where the effect of the reduced exercise on weight exactly cancels out the direct weight reduction.
      
      It’s unlikely that two effects will randomly cancel out unless the situation is the result of some optimizing process. This is the case in Milton Friedman’s thermostat but doesn’t appear to be the case in your example.
      - scav 15 Oct 2012 11:46 UTC
        2 points
        0
        Parent
        It wouldn’t be random. It would be an optimising process, tuned by evolution (another well known optimising process). If you have less food than needed to maintain your current weight, expend less energy (on activities other than trying to find more food). For most of our evolution, losing weight was a personal existential risk.
      - Caspian 15 Oct 2012 12:35 UTC
        0 points
        0
        Parent
        I had meant to suggest some sort of unintelligent feedback system. Not coincidence, but also not an intelligent optimisation, so still not an exact parallel to his thermostat.
        Eugine_Nier 16 Oct 2012 3:55 UTC
        −2 points
        0
        Parent
        The thermostat was created by an intelligent human.
        
        I never said the optimizing process had to be that intelligent, i.e., the blind-idiot-god counts.
  - selylindi 12 Oct 2012 18:27 UTC
    0 points
    0
    Parent
    Studies can always have confounding factors, of course. And I wrote “falsification” but could have more accurately said something about reducing the posterior probability. Lack of correlation (e.g. with speed) would sharply reduce the p.p. of a simple model with one input (e.g. gas pedal), but only reduce the p.p. of a model with multiple inputs (e.g. gas pedal + hilly terrain) to a weaker extent.
    - IlyaShpitser 12 Oct 2012 21:04 UTC
      2 points
      0
      Parent
      By the way, you can still learn structure from data in the presence of unobserved confounders. The problem becomes very interesting indeed, then.
      - selylindi 15 Oct 2012 2:19 UTC
        0 points
        0
        Parent
        Oh, awesome. Can you provide a link / reference / name of what I should Google?
        IlyaShpitser 15 Oct 2012 3:38 UTC
        5 points
        0
        Parent
        http://www.hss.cmu.edu/philosophy/spirtes/n-oracle.ps
        
        (FCI algorithm)
        
        http://www.cs.huji.ac.il/~nir/Abstracts/Fr2.html
        
        (structural EM)
        
        http://www.stat.washington.edu/tsr/uai-causal-structure-learning-workshop/
        
        (look for “parameter and structure learning in nested Markov models”)
- IlyaShpitser 12 Oct 2012 21:03 UTC
  6 points
  0
  Parent
  No correlation only implies no causation if a certain assumption called “faithfulness” is true, not in general.
  - lukeprog 26 Mar 2013 7:09 UTC
    4 points
    0
    Parent
    Might you be willing to explain for the rest of us what the “faithfulness assumption” is, and why it’s needed for “no correlation” to imply “no causation”? I’d appreciate it.
    - IlyaShpitser 26 Mar 2013 16:45 UTC
      15 points
      0
      Parent
      Absolutely! In a typical Bayesian network, we represent a set of probability distributions by a directed acyclic graph, such that any distribution in the set can be written as
      
      $p [x_{1}, \dots, x_{n}] = \prod_{i} p [x_{i} | p a [x_{i}]]$
      
      in other words, for every random variable in the distribution we associate a node in the graph, and the distribution can be factorized into a product of conditional densities, where each conditional is a variable (node) conditional on variables corresponding to parents of the node.
      
      This implies that if certain types of paths in the graph from a node set X to a node set Y are “blocked” in a particular way (e.g. d-separated) by a third set Z, then in all densities that factorize as above, X is independent of Y conditional on Z. Note that this implication is one way. In particular, we can still have some conditional independence that just happens to hold because the numbers in the distribution lined up just right, and the graph does not in fact advertise this independence via d-separation. When this happens, we say the distribution is unfaithful to the graph.
      
      If we pick parameters of a distribution that factorizes at random then almost all parameter picks will be faithful to the graph. However, lots of distributions are “near unfaithful”, that is they are faithful, but it’s hard to tell with limited samples. Moreover, we can’t tell in advance how many samples we need to tell. Also, it’s easy to construct faithfulness violations and they do occur in practice. For example, we may have an AIDS drug that suppresses the HIV (so it really does help!), but the drug is very nasty, with lots of side effects and so on, so doctors usually wait until the patient is already very sick before giving the drug. If we then look at associations between instances of the use of this drug we may well find that those who take the drug either die more (positive association of drug with death!) or don’t die less frequently than those without the drug (no association of drug with death!).
      
      Does this then mean the drug has a bad effect or no effect? No! It just means there is an obvious confounder of health status that we aren’t recording. In the second case, this confounder is causing the distribution over drug and death to be “unfaithful”: there is an arrow from drug to death, but there is no dependence of death on drug. And yet there is still a causal effect.
      
      Note: I am glossing over some distinctions between a Bayesian network and a causal model in order to not muddy the discussion. What is important to note is that: (a) A Bayesian network is not a graphical causal model, but (b) a graphical causal model induces a Bayesian network on the observable data. Faithfulness (or lack of it) applies to the network appearing due to (b), and thus affects causal reasoning in the underlying causal model.
      - lukeprog 26 Mar 2013 18:24 UTC
        2 points
        0
        Parent
        Thanks!
        
        This seems like a big problem for inferring “no causation” from “no correlation.” Is there a standard methodological solution? And, do researchers often just choose to infer “no causation” from “no correlation” and hope for the best, or do they avoid inferring “no causation” from “no correlation” due to the fact that they can’t tell whether the faithfulness assumption holds?
        IlyaShpitser 26 Mar 2013 19:23 UTC
        12 points
        0
        Parent
        Well, in some sense this is why causal inference is hard. Most of the time if you see independence that really does mean there is nothing there. The reasonable default is the null hypothesis: there is no causal effect. However, if you are poking around because you suspect there is something there, then not seeing any correlations does not mean you should give up. What it does mean is you should think about causal structure and specifically about confounders.
        
        What people do about confounders is:
        
        (a) Try to measure them somehow (epidemiology, medicine). If you can measure confounders you can adjust for them, and then the effect cancellation will go away.
        
        (b) Try to find an instrumental variable (econometrics). If you can find a good instrument, you can get a causal effect with some parametric assumptions, even if there are unmeasured confounders.
        
        (c) Try to randomize (statistics). This explicitly cuts out all confounding.
        
        (d) You can sometimes get around unmeasured confounders by using strong mediating variables by means of “front-door” type methods. These methods aren’t really well known, and aren’t commonly used.
        
        There is no royal road: getting rid of confounders is the entire point of causal inference. People have been thinking of clever ways to do it for close to a hundred years now. If you have infinite samples, and know where unobserved confounding is, there is an algorithm for getting the causal effect from observational data by being sneaky. This algorithm only succeeds sometimes, and if it doesn’t, there is no other way in general to do it (e.g. it’s “complete”). More in my thesis, if you are curious.
        lukeprog 26 Mar 2013 22:52 UTC
        5 points
        0
        Parent
        Thanks again.
        
        One more question, since this is your field. Do you happen to know of an instance where some new causal effect was discovered from observational data via causal modeling, and this cause was later confirmed by an RCT?
        IlyaShpitser 26 Mar 2013 23:26 UTC
        2 points
        0
        Parent
        Well, I think smoking/cancer was first established in case control studies. In general people move up the “hierarchy of evidence” Kawoomba mentioned. At the end of the day, people only trust RCTs (and they are right, other methods rely on more assumptions). There is another good example, but let me double check before posting.
        
        With case control studies you have the additional problem of selection bias, on top of confounding.
        gwern 27 Mar 2013 0:51 UTC
        2 points
        0
        Parent
        I thought there were still no actual RCTs of smoking in humans.
        IlyaShpitser 27 Mar 2013 0:55 UTC
        0 points
        0
        Parent
        Right, you can’t always RCT in humans. But a causal mechanism + RCTs in animals biologically close to humans is convincing for something like lung cancer where minor differences among mammals shouldn’t matter much (although e.g. bears have evolved some crazy stuff to deal with all that fat they eat before hibernating).
        gwern 27 Mar 2013 1:58 UTC
        0 points
        0
        Parent
        
        where minor differences among mammals shouldn’t matter much
        
        I think you are entirely optimistic. I recently pointed out that the research indicates that animal studies routinely (probably usually) do not transfer, and as it happens, animal smoking studies are an example of this, according to Hanson. So the differences are often far from minor, and even if there were cancer in the animal studies, we could infer very little from it.
        Expand this thread
        IlyaShpitser 27 Mar 2013 2:52 UTC
        0 points
        0
        Parent
        Out of curiosity, do you smoke?
        gwern 27 Mar 2013 2:56 UTC
        0 points
        0
        Parent
        No.
        IlyaShpitser 27 Mar 2013 18:42 UTC
        −2 points
        0
        Parent
        I find much to agree with in Hanson’s writings, but in this case I just don’t find him convincing. One issue is that cancer is a scourge of a long-living animal. One hypothesis is that smoking causes long term cumulative damage, and you might not see effects in mice or dogs because they die too soon regardless. There is also the issue that we have a fair idea of the carcinogenic mechanism now, so if you think smoking does not cause harm, there also needs to be a story how that mechanism is foiled in humans.
        Vaniver 27 Mar 2013 20:07 UTC
        2 points
        0
        Parent
        
        I find much to agree with in Hanson’s writings, but in this case I just don’t find him convincing.
        
        His interpretation, or his evidence? I point this out because it looks to me like your position has shifted from “the smoking / lung cancer link is established by RCTs in animals” to “even though RCTs don’t establish the smoking / lung cancer link for animals, we have other reasons to believe in the smoking / lung cancer link for humans.”
        gwern 27 Mar 2013 20:05 UTC
        −1 points
        0
        Parent
        
        I find much to agree with in Hanson’s writings, but in this case I just don’t find him convincing...One hypothesis is that smoking causes long term cumulative damage, and you might not see effects in mice or dogs because they die too soon regardless.
        
        So: heads I win, tails you lose? If the studies had found smoking caused cancer in animals, well, that proves it! And if they don’t, well, that just means they didn’t run long enough so we can ignore them and say we “just don’t find them convincing”...
        
        There is also the issue that we have a fair idea of the carcinogenic mechanism now, so if you think smoking does not cause harm, there also needs to be a story how that mechanism is foiled in humans.
        
        You don’t think there were plenty of ‘fair ideas’ of mechanisms floating around in the thousands of animal studies and interventions covered in my animal studies link? Any researcher worth his degree can come up with a plausible ex post explanation.
        Richard_Kennaway 27 Mar 2013 14:59 UTC
        2 points
        0
        Parent
        
        This algorithm only succeeds sometimes, and if it doesn’t, there is no other way in general to do it (e.g. it’s “complete”). More in my thesis, if you are curious.
        
        Your thesis deals only with acyclic causal graphs. What is the current state of the art for cyclic causal graphs? You’ll know already that I’ve been looking at that, and I have various papers of other people that attempt to take steps in that direction, but my impression is that none of them actually get very far and there is nothing like a set of substantial results that one can point to. Even my own, were they in print yet, are primarily negative.
        IlyaShpitser 27 Mar 2013 16:27 UTC
        3 points
        0
        Parent
        The recent stuff I have seen is negative results:
        
        (a) Can’t assign Pearlian semantics to cyclic graphs.
        
        (b) If you assign equilibrium semantics, you might as well use a dynamic causal Bayesian network, a cyclic graph does not buy you anything.
        
        (c) A graph representing the Markov property of the equilibrium distribution of a Markov chain represented by a causal DBN is an interesting open question. (This graph wouldn’t have a causal interpretation of course).
        Kawoomba 26 Mar 2013 19:45 UTC
        0 points
        0
        Parent
        As far as I can tell, epidemiology and medicine are mostly doing (c), in the form of RCTs (which are the gold standard of medical evidence, other than meta-reviews). There are other study designs such as most variants of case-control studies and cohort studies which do take the (a) approach, but they aren’t considered to be the same level of evidence as randomized controlled trials.
        IlyaShpitser 26 Mar 2013 20:27 UTC
        3 points
        0
        Parent
        
        but they aren’t considered to be the same level of evidence as randomized controlled trials.
        
        Quite rightly—if we randomize, we don’t care what the underlying causal structure is, we just cut all confounding out anyways. Methods (a), (b), (d) all rely on various structural assumptions that may or may not hold. However, even given those assumptions figuring out how to do causal inference from observational data is quite difficult. The problem with RCTs is expense, ethics, and statistical power (hard to enroll a ton of people in an RCT).
        
        Epidemiology and medicine does a lot of (a), look for the keywords “g-formula”, “g-estimation”, “inverse probability weighting,” “propensity score”, “marginal structural models,” “structural nested models”, “covariate adjustment,” “back-door criterion”, etc. etc.
        
        People talk about “controlling for other factors” when discussing associations all the time, even in non-technical press coverage. They are talking about (a).
        Kawoomba 26 Mar 2013 21:02 UTC
        0 points
        0
        Parent
        
        People talk about “controlling for other factors” when discussing associations all the time, even in non-technical press coverage. They are talking about (a).
        
        True, true. “Gold standard” or “preferred level of evidence” versus “what’s mostly conducted given the funding limitations”. However, to make it into a guideline, there are often RCT follow-ups for hopeful associations uncovered by the lesser study designs.
        
        look for the keywords “g-formula”, “g-estimation”, “inverse probability weighting,” “propensity score”, “marginal structural models,” “structural nested models”, “covariate adjustment,” “back-door criterion”, etc. etc.
        
        I, of course, know all of those. The letters, I mean.
        Kawoomba 26 Mar 2013 18:39 UTC
        0 points
        0
        Parent
        “No subtle confounders” and “increasing sample size (decreases relevance and likelihood of such special cases)” would have m-answered your previous z-comments. (SCNR)
- A1987dM 12 Oct 2012 20:36 UTC
  2 points
  0
  Parent
  That only works if by correlation you mean any kind of statistic dependence—Pearson’s correlation coefficient does vanish for certain relationships if they’re non-monotonic.