I think when you break it into two separate problems like that, you miss the point.
I am pretty sure I am not, but let’s see. What you are basically saying is “analysis ⇒ synthesis doesn’t work.”
Combining two RCTs is reasonably well-solved by multilevel random effects models.
Hierarchical models are a particular parametric modeling approach for data drawn from multiple sources. People use this type of stuff to good effect, but saying it “solves the problem” here is sort of like saying linear regression “solves” RCTs. What if the modeling assumptions are wrong? What if you are not sure what the model should be?
I’m also not trying to solve the problem of inferring from a correlational dataset to specific causal models, which > seems well in hand by Pearlean approaches.
Let’s call them “interventionist approaches.” Pearl is just the guy people here read. People have been doing causal analysis from observational data since at least the 70s, probably earlier in certain special cases.
I’m trying to bridge between the two: assume a specific generative model for correlation vs causation and then > infer the distribution.
But this is exactly the problem! Apparently, there is no meaningful ‘average causal effect’ between correlational
and causational studies.
This is what we should talk about.
If there is one RCT, we have a treatment A (with two levels a, and a’) and outcome Y. Of interest is outcome under hypothetical treatment assignment to a value, which we write Y(a) or Y(a’). “Average causal effect” is E[Y(a)] - E[Y(a’)]. So far so good.
If there is one observational study, say A is assigned based on C, and C affects Y, what is of interest is still Y(a) or Y(a’). Interventionist methods would give you a formula for E[Y(a)] - E[Y(a’)] in terms of p(A,C,Y). You can then construct an estimator for that formula, and life is good. So far so good.
Note that so far I made no modeling assumptions on the relationship of A and Y at all. It’s all completely unrestricted by choice of statistical model. I can do crazy non-parametric random forest to model the relationship of A and Y if I wanted. I can do linear regression. I can do whatever. This is important—people often smuggle in modeling assumptions “too soon.” When we are talking about prediction problems like in machine learning, that’s ok. We don’t care about modeling too much we just want good predictive performance. When we care about effects, the model is important. This is because if the effect is not strong and your model is garbage, it can mislead you.
If there are two RCTs, we have two sets of outcomes: Y1(a), Y1(a’) and Y2(a), Y2(a’). Even here, there is no one causal effect so far. We need to make some sort of assumption on how to combine these. For example,
we may try to generalize regression models, and say that a lot of the way A affects Y is the same regression across the two studies, but some of the regression terms are allowed to differ to model population heterogeneity.
This is what hierarchical models do.
In general we have E[f(Y1(a), Y2(a))] - E[f(Y1(a’),Y2(a’))], for some f(.,.) that we should justify. At this level, things are completely non-parametric. We can model the relationship of A and Y1,Y2 however we want. We can model f however we want.
If we have one RCT and one observational study, we still have Y1(a), Y1(a’) for the RCT, and Y2(a), Y2(a’) for the observational study. To determine the latter we use “interventionist approaches” to express them in terms of observational data. We then combine things using f(.,.) as before. As before we should justify all the modeling we are doing.
I am pretty sure Barenboim thought about this stuff (but he doesn’t do statistical inference, just the general setup).
What you are basically saying is “analysis ⇒ synthesis doesn’t work.”
I am pretty sure it is not going to let you take an effect size and a standard error from a correlation study and get out a accurate posterior distribution of the causal effect without doing something similar to what I’m proposing.
If there are two RCTs, we have two sets of outcomes: Y1(a), Y1(a’) and Y2(a), Y2(a’). Even here, there is no one causal effect so far. We need to make some sort of assumption on how to combine these. For example, we may try to generalize regression models, and say that a lot of the way A affects Y is the same regression across the two studies, but some of the regression terms are allowed to differ to model population heterogeneity. This is what hierarchical models do. In general we have E[f(Y1(a), Y2(a))] - E[f(Y1(a’),Y2(a’))], for some f(.,.) that we should justify. At this level, things are completely non-parametric. We can model the relationship of A and Y1,Y2 however we want. We can model f however we want.
Ok, and how do we model them? I am proposing a multilevel mixture model to compare them.
If we have one RCT and one observational study, we still have Y1(a), Y1(a’) for the RCT, and Y2(a), Y2(a’) for the observational study. To determine the latter we use “interventionist approaches” to express them in terms of observational data. We then combine things using f(.,.) as before. As before we should justify all the modeling we are doing.
Which is not going to work since in most, if not all, of these studies, the original patient-level data is not going to be available and you’re not even going to get a correlation matrix out of the published paper, and I haven’t seen any intervention-style algorithms which work with just the effect sizes which is what is on offer.
To work with the sparse data that is available, you are going to have to do something in between a meta-analysis and an interventionist analysis.
I am proposing a multilevel mixture model to compare them.
Ok. You can use whatever statistical model you want, as long as we are clear what the underlying object is you are dealing with. The difficulty here isn’t the statistical modeling, but being clear about what it is that is being estimated (in other words the interpretation of the parameters of the model). This is why I don’t talk about statistical modeling at first.
haven’t seen any intervention-style algorithms which work with just the effect sizes which is what is on offer.
If all you have is reported effect sizes you won’t get anything good out. You need the data they used.
Depends on what you want. It doesn’t matter “who has priority” when it comes to learning the subject. Pearl’s book is good, but one big disadvantage of reading just Pearl is Pearl does not deal with the statistical inference end of causal inference very much (by choice). Actually, I heard Pearl has a new book in the works, more suitable for teaching.
But ultimately we must draw causal conclusions from actual data, so statistical inference is important. Some big names that combine causal and statistical inference: Jamie Robins, Miguel Hernan, Eric Tchetgen Tchetgen, Tyler VanderWeele (Harvard causal group), Mark van der Laan (Berkeley), Donald Rubin et al (Harvard), Frangakis, Rosenblum, Scharfstein, etc. (Johns Hopkins causal group), Andrea Rotnitzky (Harvard), Susan Murphy (Michigan), Thomas Richardson (UW), Phillip Dawid (Cambridge, but retired, incidentally the inventor of conditional independence notation). Lots of others.
I am pretty sure I am not, but let’s see. What you are basically saying is “analysis ⇒ synthesis doesn’t work.”
Hierarchical models are a particular parametric modeling approach for data drawn from multiple sources. People use this type of stuff to good effect, but saying it “solves the problem” here is sort of like saying linear regression “solves” RCTs. What if the modeling assumptions are wrong? What if you are not sure what the model should be?
Let’s call them “interventionist approaches.” Pearl is just the guy people here read. People have been doing causal analysis from observational data since at least the 70s, probably earlier in certain special cases.
This is what we should talk about.
If there is one RCT, we have a treatment A (with two levels a, and a’) and outcome Y. Of interest is outcome under hypothetical treatment assignment to a value, which we write Y(a) or Y(a’). “Average causal effect” is E[Y(a)] - E[Y(a’)]. So far so good.
If there is one observational study, say A is assigned based on C, and C affects Y, what is of interest is still Y(a) or Y(a’). Interventionist methods would give you a formula for E[Y(a)] - E[Y(a’)] in terms of p(A,C,Y). You can then construct an estimator for that formula, and life is good. So far so good.
Note that so far I made no modeling assumptions on the relationship of A and Y at all. It’s all completely unrestricted by choice of statistical model. I can do crazy non-parametric random forest to model the relationship of A and Y if I wanted. I can do linear regression. I can do whatever. This is important—people often smuggle in modeling assumptions “too soon.” When we are talking about prediction problems like in machine learning, that’s ok. We don’t care about modeling too much we just want good predictive performance. When we care about effects, the model is important. This is because if the effect is not strong and your model is garbage, it can mislead you.
If there are two RCTs, we have two sets of outcomes: Y1(a), Y1(a’) and Y2(a), Y2(a’). Even here, there is no one causal effect so far. We need to make some sort of assumption on how to combine these. For example, we may try to generalize regression models, and say that a lot of the way A affects Y is the same regression across the two studies, but some of the regression terms are allowed to differ to model population heterogeneity. This is what hierarchical models do.
In general we have E[f(Y1(a), Y2(a))] - E[f(Y1(a’),Y2(a’))], for some f(.,.) that we should justify. At this level, things are completely non-parametric. We can model the relationship of A and Y1,Y2 however we want. We can model f however we want.
If we have one RCT and one observational study, we still have Y1(a), Y1(a’) for the RCT, and Y2(a), Y2(a’) for the observational study. To determine the latter we use “interventionist approaches” to express them in terms of observational data. We then combine things using f(.,.) as before. As before we should justify all the modeling we are doing.
I am pretty sure Barenboim thought about this stuff (but he doesn’t do statistical inference, just the general setup).
I am pretty sure it is not going to let you take an effect size and a standard error from a correlation study and get out a accurate posterior distribution of the causal effect without doing something similar to what I’m proposing.
Ok, and how do we model them? I am proposing a multilevel mixture model to compare them.
Which is not going to work since in most, if not all, of these studies, the original patient-level data is not going to be available and you’re not even going to get a correlation matrix out of the published paper, and I haven’t seen any intervention-style algorithms which work with just the effect sizes which is what is on offer.
To work with the sparse data that is available, you are going to have to do something in between a meta-analysis and an interventionist analysis.
Ok. You can use whatever statistical model you want, as long as we are clear what the underlying object is you are dealing with. The difficulty here isn’t the statistical modeling, but being clear about what it is that is being estimated (in other words the interpretation of the parameters of the model). This is why I don’t talk about statistical modeling at first.
If all you have is reported effect sizes you won’t get anything good out. You need the data they used.
Is there anyone you would recommend studying in addition?
Depends on what you want. It doesn’t matter “who has priority” when it comes to learning the subject. Pearl’s book is good, but one big disadvantage of reading just Pearl is Pearl does not deal with the statistical inference end of causal inference very much (by choice). Actually, I heard Pearl has a new book in the works, more suitable for teaching.
But ultimately we must draw causal conclusions from actual data, so statistical inference is important. Some big names that combine causal and statistical inference: Jamie Robins, Miguel Hernan, Eric Tchetgen Tchetgen, Tyler VanderWeele (Harvard causal group), Mark van der Laan (Berkeley), Donald Rubin et al (Harvard), Frangakis, Rosenblum, Scharfstein, etc. (Johns Hopkins causal group), Andrea Rotnitzky (Harvard), Susan Murphy (Michigan), Thomas Richardson (UW), Phillip Dawid (Cambridge, but retired, incidentally the inventor of conditional independence notation). Lots of others.
I believe Stephen Cole posts here, and he does this stuff also (http://sph.unc.edu/adv_profile/stephen-r-cole-phd/).
Miguel Hernan and Jamie Robins are working on a new causal inference book that is more statistical, might be worth a look. Drafts available online: