IlyaShpitser comments on Meetup report: How harmful is cannabis, and will you change your habits?

IlyaShpitser 10 Sep 2012 17:02 UTC
0 points
Ok, when you say “correct” you mean you try to discover as many hidden variables in your DAG as possible and try to collect data on them such that they become observed. When you say “control” you mean a particular implementation of the adjustment formula: p(y | do(a)) = sum{c} p(y | a, c) p(c), where a is the treatment, y is the outcome, and c is measured covariates. (Note: using “independent/dependent” variable is not correct because those variables are not guaranteed to have a causal relationship you want—an effect can be independent and a cause can be dependent).

The point of some of the work in causal inference, including the paper I linked is that in some cases you don’t need to either “correct” or “control” in the senses of the words you are using. For example if your graph is:

A → W → Y, and there is an unobserved common cause U of A and Y, then you don’t need to “correct” for the presence of this U by trying to measure it, nor can you “control” for U since you cannot measure it. What you can do is use the following formula: p(y | do(a)) = sum{w} p(w | a) sum{a’} p(y | w, a’) p(a’).

There are more complex versions of the same trick discussed in great detail in the paper I linked.
- Decius 10 Sep 2012 21:42 UTC
  0 points
  Parent
  It is the independent variable in a controlled study because the study makes that variable independent of all other variables. It doesn’t matter if normally U->A, in the controlled study A is determined by sorting into groups. Instead of observing A, A is decided by fiat.
  
  The formulae only work if you have a graph of what you believe the causal chain might be, and gather data for each step that you have. If, for example, you think the chain is A->W->Y, with a potential U->A and U->Y, but the actual chain is U->Not-W; U->Y; A->W and A->Y, you provides bad advice to people who wish Y or Not-Y and are deciding on A.
  - IlyaShpitser 10 Sep 2012 21:46 UTC
    1 point
    Parent
    “Independent/dependent” variables are used when talking about functions and regression models, even when those functions and regression models are not causal. For this reason, I believe it is confusing usage. Ordinary statistical regressions are invertible, causal regressions are not.
    
    The formulae are correct iff the graph is correct, that is true. I am not sure what you are trying to say. If your assumptions are wrong, your entire analysis is garbage. This is true of any analysis. Are you saying anything beyond this? Please clarify what you mean.
    - Decius 10 Sep 2012 22:35 UTC
      0 points
      Parent
      With controlled experimentation, one can be almost certain that the effect measured is due to the variable modified. It doesn’t matter if you have a correct graph of the confounding factors, because you balance them against each other.
      
      What you are doing is measuring the combined strength of all chains of the type A->?->Y
      - IlyaShpitser 11 Sep 2012 1:53 UTC
        1 point
        Parent
        Even in randomized trials you need to worry about assumptions. For example, you have to worry that your samples represent the general population. You have to worry that the actual random assignment with the people you have in your study well approximated the ideal random assignment in an infinite population. You then have to worry about modeling assumptions if you are doing statistical modeling on top of that. It is true you don’t need assumptions that link observational and interventional quantities if you randomize.
        
        “What you are doing is measuring the combined strength of all chains of the type A->?->Y”
        
        If the graph is as I described that’s what you want (e.g. the causal effect, e.g. the variation in Y under randomizing A).
        Decius 11 Sep 2012 4:31 UTC
        0 points
        Parent
        I don’t do random assignment. I divide the sample set into two or more groups that are as close to identical as possible, including their prior variation along A. Figuring out if one split is closer than a different one is nontrivial.
        
        The only random decision is which group gets which A.