I am not using Bayesian inference, and neither are Pearl and Bareinboim.
Ok, I think that’s the main issue here. As a criticism of Pearl and Bareinboim, I agree this is basically valid. That said, I’d still say that throwing out DAGs is a terrible way to handle the issue—Bayesian inference with DAGs is the right approach for this sort of problem.
The whole argument about identification problems then becomes irrelevant, as it should. Sometimes the true model of reality is not identifiable. This is not a problem with reality, and pretending some other model generates reality is not the way to fix it. The way to fix it is to use an inference procedure which does not assume identifiability.
Any attempt to extrapolate from Russia to Norway is going to depend on a background belief that some aspect of the data generating structure is equal between the countries. In the case of Russian roulette, I argue that the natural choice of mathematical object to hang our claims to structural equality on, is the parameter that takes the value 5⁄6 in both countries.
The equality of this parameter is not sufficient to make the prediction we want to make—the counterfactual is still underspecified. The survival ratio calculation will only be correct if a particular DAG and counterfactual apply, and will be incorrect otherwise. By not using a DAG, it becomes unclear what assumptions we’re even making—it’s not at all clear what counterfactual we’re using, or whether there even is a well-defined counterfactual for which the calculation is correct.
Ok, I think that’s the main issue here. As a criticism of Pearl and Bareinboim, I agree this is basically valid. That said, I’d still say that throwing out DAGs is a terrible way to handle the issue—Bayesian inference with DAGs is the right approach for this sort of problem.
I am not throwing out DAGs. I am just claiming that the particular aspect of reality that I think justifies extrapolation cannot be represented on a standard DAG. While I formalized my causal model for these aspects of reality without using graphs, I am confident that there exists a way to represent the same structural constraints in a DAG model. It is just that nobody has done it yet.
As for combining Bayesian inference and DAGs: This is one of those ideas that sounds great in principle, but where the details get very messy. I don’t have a good enough understanding of Bayesian statistics to make the argument in full, but I do know that very smart people have tried to combine it with causal models and concluded that it doesn’t work. Bayesianism therefore plays essentially no role in the causal modelling literature. If you believe you have an obvious solution to this, I recommend you write it up and submit to a journal, because you will get a very impactful publication out of it.
The equality of this parameter is not sufficient to make the prediction we want to make—the counterfactual is still underspecified. The survival ratio calculation will only be correct if a particular DAG and counterfactual apply, and will be incorrect otherwise.
In a country where nobody plays Russian roulette, you have valid data on the distribution of outcomes under the scenario where nobody plays Russian roulette (due to simple consistency). In combination with knowledge about the survival ratio, this is sufficient to make a prediction for the distribution of outcomes in a counterfactual where everybody plays Russian roulette.
I don’t have a good enough understanding of Bayesian statistics to make the argument in full, but I do know that very smart people have tried to combine it with causal models and concluded that it doesn’t work.
Do you know of any references on the problems people have run into? I’ve used Bayesian inference on causal models in my own day-to-day work quite a bit without running into any fundamental issues (other than computational difficulty), and what I’ve read of people using them in neuroscience and ML generally seems to match that. So it sounds like knowledge has failed to diffuse—either the folks using this stuff haven’t heard about some class of problems with it, or the causal modelling folks are insufficiently steeped in Bayesian inference to handle the tricky bits.
From my perspective, as someone who is not well trained in Bayesian methods and does not pretend to understand the issue well, I just observe that methodological work on causal models very rarely uses Bayesian statistics, that I myself do not see an obvious way to integrate it, and that most of the smart people working on causal inference appear to be skeptical of such attempts
Ok, after reading these, it’s sounding a lot more like the main problem is causal inference people not being very steeped in Bayesian inference. Robins, Hernan and Wasserman’s argument is based on a mistake that took all of ten minutes to spot: they show that a particular quantity is independent of propensity score function if the true parameters of the model are known, then jump to the estimate of that quantity being independent of propensity—when in fact, the estimate is dependent on the propensity, because the estimates of the model parameters depend on propensity. Pearl’s argument is more abstract and IMO stronger, but is based on the idea that causal relationships are not statistically testable… when in fact, that’s basically the bread-and-butter use-case for Bayesian model comparison.
Some time in the next week I’ll write up a post with a few full examples (including the one from Robins, Hernan and Wasserman), and explain in a bit more detail.
(Side note: I suspect that the reason smart people have had so much trouble here is that the previous generation was mostly introduced to Bayesian statistics by Savage or Gelman; I expect someone who started with Jaynes would have a lot less trouble here, but his main textbook is relatively recent.)
Some time in the next week I’ll write up a post with a few full examples (including the one from Robins, Hernan and Wasserman), and explain in a bit more detail.
I look forward to reading it. To be honest: Knowing these authors, I’d be surprised if you have found an error that breaks their argument.
We are now discussing questions that are so far outside of my expertise that I do not have the ability to independently evaluate the arguments, so I am unlikely to contribute further to this particular subthread (i.e. to the discussion about whether there exists an obvious and superior Bayesian solution to the problem I am trying to solve).
Ok, I think that’s the main issue here. As a criticism of Pearl and Bareinboim, I agree this is basically valid. That said, I’d still say that throwing out DAGs is a terrible way to handle the issue—Bayesian inference with DAGs is the right approach for this sort of problem.
The whole argument about identification problems then becomes irrelevant, as it should. Sometimes the true model of reality is not identifiable. This is not a problem with reality, and pretending some other model generates reality is not the way to fix it. The way to fix it is to use an inference procedure which does not assume identifiability.
The equality of this parameter is not sufficient to make the prediction we want to make—the counterfactual is still underspecified. The survival ratio calculation will only be correct if a particular DAG and counterfactual apply, and will be incorrect otherwise. By not using a DAG, it becomes unclear what assumptions we’re even making—it’s not at all clear what counterfactual we’re using, or whether there even is a well-defined counterfactual for which the calculation is correct.
I am not throwing out DAGs. I am just claiming that the particular aspect of reality that I think justifies extrapolation cannot be represented on a standard DAG. While I formalized my causal model for these aspects of reality without using graphs, I am confident that there exists a way to represent the same structural constraints in a DAG model. It is just that nobody has done it yet.
As for combining Bayesian inference and DAGs: This is one of those ideas that sounds great in principle, but where the details get very messy. I don’t have a good enough understanding of Bayesian statistics to make the argument in full, but I do know that very smart people have tried to combine it with causal models and concluded that it doesn’t work. Bayesianism therefore plays essentially no role in the causal modelling literature. If you believe you have an obvious solution to this, I recommend you write it up and submit to a journal, because you will get a very impactful publication out of it.
In a country where nobody plays Russian roulette, you have valid data on the distribution of outcomes under the scenario where nobody plays Russian roulette (due to simple consistency). In combination with knowledge about the survival ratio, this is sufficient to make a prediction for the distribution of outcomes in a counterfactual where everybody plays Russian roulette.
Do you know of any references on the problems people have run into? I’ve used Bayesian inference on causal models in my own day-to-day work quite a bit without running into any fundamental issues (other than computational difficulty), and what I’ve read of people using them in neuroscience and ML generally seems to match that. So it sounds like knowledge has failed to diffuse—either the folks using this stuff haven’t heard about some class of problems with it, or the causal modelling folks are insufficiently steeped in Bayesian inference to handle the tricky bits.
I don’t have a great reference for this.
A place to start might be Judea Pearl’s essay “Why I’m only half-Bayesian” at https://ftp.cs.ucla.edu/pub/stat_ser/r284-reprint.pdf . If you look at his Twitter account at @yudapearl, you will also see numerous tweets where he refers to Bayes Theorem as a “trivial identity” and where he talks about Bayesian statistics as “spraying priors on everything”. See for example https://twitter.com/yudapearl/status/1143118757126000640 and his discussions with Frank Harrell.
Another good read may be Robins, Hernan and Wasserman’s letter to the editor at Biometrics, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4667748/ . While that letter is not about graphical models, the propensity scores/marginal structural models are mathematically very closely related. The main argument in that letter (which was originally a blog post) has been discussed on Less Wrong before; I am trying to find the discussion, it may be this link https://www.lesswrong.com/posts/xdh5FPMYYGGX7PBKj/the-trouble-with-bayes-draft
From my perspective, as someone who is not well trained in Bayesian methods and does not pretend to understand the issue well, I just observe that methodological work on causal models very rarely uses Bayesian statistics, that I myself do not see an obvious way to integrate it, and that most of the smart people working on causal inference appear to be skeptical of such attempts
Ok, after reading these, it’s sounding a lot more like the main problem is causal inference people not being very steeped in Bayesian inference. Robins, Hernan and Wasserman’s argument is based on a mistake that took all of ten minutes to spot: they show that a particular quantity is independent of propensity score function if the true parameters of the model are known, then jump to the estimate of that quantity being independent of propensity—when in fact, the estimate is dependent on the propensity, because the estimates of the model parameters depend on propensity. Pearl’s argument is more abstract and IMO stronger, but is based on the idea that causal relationships are not statistically testable… when in fact, that’s basically the bread-and-butter use-case for Bayesian model comparison.
Some time in the next week I’ll write up a post with a few full examples (including the one from Robins, Hernan and Wasserman), and explain in a bit more detail.
(Side note: I suspect that the reason smart people have had so much trouble here is that the previous generation was mostly introduced to Bayesian statistics by Savage or Gelman; I expect someone who started with Jaynes would have a lot less trouble here, but his main textbook is relatively recent.)
I look forward to reading it. To be honest: Knowing these authors, I’d be surprised if you have found an error that breaks their argument.
We are now discussing questions that are so far outside of my expertise that I do not have the ability to independently evaluate the arguments, so I am unlikely to contribute further to this particular subthread (i.e. to the discussion about whether there exists an obvious and superior Bayesian solution to the problem I am trying to solve).