(1) I just think calling a nonrandomized study a correlational study is weird.
(2) I meant to say effect; not study; fixed
(3) If something is caused by a confounding variable, then the independent variable may have no relationship with the dependent variable. You seem to be using correlation to mean the result of an analysis, but I’m thinking of it as the actual real relationship which is distinct from causation. So y=x does not mean y causes x or that x causes y.
I don’t understand what you mean by “real relationship”. I suggest tabooing the terms “real relationship” and “no relationship”.
I am using the word “correlation” to discuss whether the observed variable X predicts the observed variable Y in the (hypothetical?) superpopulation from which the sample was drawn. Such a correlation can exist even if neither variable causes the other.
If X predicts Y in the superpopulation (regardless of causality), the correlation will indeed be real. The only possible definition I can think of for a “false” correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability. Statistical methodology is in general more than adequate to discuss whether the appearance of correlation in your sample is due to real correlation in the superpopulation. You do not need causal inference to reason about this question. Moreover, confounding is not relevant.
Confounding and causal inference are only relevant if you want to know whether the correlation in the superpopulation is due to the causal effect of X on Y. You can certainly define the causal effect as the “actual real relationship”, but then I don’t understand how it is distinct from causation.
The only possible definition I can think of for a “false” correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability.
Right. Which is the problem randomization attempts to correct for, which I think of as a separate problem from causation.
Intersample variability is a type of confound. Increasing sample size is another method for reducing confounding due to intersample variability. Maybe you meant intrasample variability, but that doesn’t make much sense to me in context. Maybe you think of intersample variability as sampling error? Or maybe you have a weird definition of confounding?
Either way, confounding is a separate problem from causation. You can isolate the confounding variables from the independent variable to determine the correlation between x and y without determining a causal relationship. You can also determine the presence of a causal relationship without isolating the independent variable from possible confounding variables.
The nonrandomized studies are determining causality; they’re just doing a worse job at isolating the independent variable, which is what gwern appears to be talking about here.
Or maybe you have a weird definition of confounding?
I use the standard definition of confounding based on whether E(Y| X=x) = E(Y| Do(X=x)), and think about it in terms of whether there exists a backdoor path between X and Y.
Either way, confounding is a separate problem from causation.
The concept of confounding is defined relative to the causal query of interest. If you don’t believe me, try to come up with a coherent definition of confounding that does not depend on the causal question.
You can isolate the confounding variables from the independent variable to determine the correlation between x and y without determining a causal relationship.
With standard statistical techniques you will be able to determine the correlation between X and Y. You will also be able to determine the correlation between X and Y conditional on Z. These are both valid questions and they are both are true correlations. Whether either of those correlations is interesting depends on your causal question and on whether Z is a confounder for that particular query.
You can also determine the presence of a causal relationship without isolating the independent variable from possible confounding variables.
No you can’t. (Unless you have an instrumental variable, in which case you have to make the assumption that the instrument is unconfounded instead of the treatment of interest)
(re: last sentence, also have to assume no direct effect of instrument, but I am sure you knew that, just emphasizing the confounding assumption since discussion is about confounding).
Grand parent’s attitude is precisely what is wrong with LW culture’s complete and utter lack of epistemic/social humility (which I think they inherited from Yudkowsky and his planet-sized ego). Him telling you of all people that you are using a weird definition of confounding is incredibly amusing.
(1) I just think calling a nonrandomized study a correlational study is weird.
(2) I meant to say effect; not study; fixed
(3) If something is caused by a confounding variable, then the independent variable may have no relationship with the dependent variable. You seem to be using correlation to mean the result of an analysis, but I’m thinking of it as the actual real relationship which is distinct from causation. So y=x does not mean y causes x or that x causes y.
I don’t understand what you mean by “real relationship”. I suggest tabooing the terms “real relationship” and “no relationship”.
I am using the word “correlation” to discuss whether the observed variable X predicts the observed variable Y in the (hypothetical?) superpopulation from which the sample was drawn. Such a correlation can exist even if neither variable causes the other.
If X predicts Y in the superpopulation (regardless of causality), the correlation will indeed be real. The only possible definition I can think of for a “false” correlation is one that does not exist in the superpopulation, but which appears in your sample due to sampling variability. Statistical methodology is in general more than adequate to discuss whether the appearance of correlation in your sample is due to real correlation in the superpopulation. You do not need causal inference to reason about this question. Moreover, confounding is not relevant.
Confounding and causal inference are only relevant if you want to know whether the correlation in the superpopulation is due to the causal effect of X on Y. You can certainly define the causal effect as the “actual real relationship”, but then I don’t understand how it is distinct from causation.
Right. Which is the problem randomization attempts to correct for, which I think of as a separate problem from causation.
No. Randomization abolishes confounding, not sampling variability
If your problem is sampling variability, the answer is to increase the power.
If your problem is confounding, the ideal answer is randomization and the second best answer is modern causality theory.
Statisticians study the first problem, causal inference people study the second problem
Intersample variability is a type of confound. Increasing sample size is another method for reducing confounding due to intersample variability. Maybe you meant intrasample variability, but that doesn’t make much sense to me in context. Maybe you think of intersample variability as sampling error? Or maybe you have a weird definition of confounding?
Either way, confounding is a separate problem from causation. You can isolate the confounding variables from the independent variable to determine the correlation between x and y without determining a causal relationship. You can also determine the presence of a causal relationship without isolating the independent variable from possible confounding variables.
The nonrandomized studies are determining causality; they’re just doing a worse job at isolating the independent variable, which is what gwern appears to be talking about here.
No it isn’t
I use the standard definition of confounding based on whether E(Y| X=x) = E(Y| Do(X=x)), and think about it in terms of whether there exists a backdoor path between X and Y.
The concept of confounding is defined relative to the causal query of interest. If you don’t believe me, try to come up with a coherent definition of confounding that does not depend on the causal question.
With standard statistical techniques you will be able to determine the correlation between X and Y. You will also be able to determine the correlation between X and Y conditional on Z. These are both valid questions and they are both are true correlations. Whether either of those correlations is interesting depends on your causal question and on whether Z is a confounder for that particular query.
No you can’t. (Unless you have an instrumental variable, in which case you have to make the assumption that the instrument is unconfounded instead of the treatment of interest)
Anders_H, you are much more patient than I am!
(re: last sentence, also have to assume no direct effect of instrument, but I am sure you knew that, just emphasizing the confounding assumption since discussion is about confounding).
Grand parent’s attitude is precisely what is wrong with LW culture’s complete and utter lack of epistemic/social humility (which I think they inherited from Yudkowsky and his planet-sized ego). Him telling you of all people that you are using a weird definition of confounding is incredibly amusing.