Did you read Jamie’s and Larry’s counterexample where they construct a case where the propensity score is known exactly but the treatment/baseline/outcome model is too complex to bother w/ likelihood methods?
Couldn’t we extend this to longitudinal settings and just say MSMs are better than the parametric g-formula if the models for the latter are too complex? Would this not render the strong likelihood principle false? If you don’t think causal inference problems are in the “right magisterium” for the likelihood principle, just consider missing data problems instead (same issues arise, in fact their counterexample is phrased as missing data).
This is an interesting counterexample, and I agree with Larry that using priors which depend on pi(x) is really no Bayesian solution at all. But if this example is really so problematic for Bayesian inference, can one give an explicit example of some function theta(x) for which no reasonable Bayesian prior is consistent? I would guess that only extremely pathological and unrealistic examples theta(x) would cause trouble for Bayesians. What I notice about many of these “Bayesian non-consistency” examples is that they require consistency over very large function classes: hence they shouldn’t really scare a subjective Bayesian who knows that any function you might encounter in the real world would be much better behaved.
In terms of practicality, it’s certainly inconvenient to have to compute a non-parametric posterior just to do inference on a single real parameter phi. To me, the two practical aspects of actually specifying priors and actually computing the posterior remain the only real weakness of the subjective Bayesian approach (or the Likelihood principle more generally.)
PS: Perhaps it’s worth discussing this example as its own thread.
It’s not obvious to me that they got the Bayesian analysis right in that blog post. If you can have “no observation” for Y, it seems like what we actually observe is some Y’ that can take on the values {0,1,null}, and the probability distribution over our observations of the variables (X,R,Y’) is p(X) * P(R|X) * P(Y’|X,R).
EDIT: Never mind, it’s not a problem. Even if it was, it wouldn’t have changed their case that the Bayesian update won’t give you this “uniform consistency” property. Which seems like something worth looking into.
As for this “low information” bull-hockey, let us put a MML prior over theta(x) and never speak of it again.
Was the professor in question Jamie?
Did you read Jamie’s and Larry’s counterexample where they construct a case where the propensity score is known exactly but the treatment/baseline/outcome model is too complex to bother w/ likelihood methods?
https://normaldeviate.wordpress.com/2012/08/28/robins-and-wasserman-respond-to-a-nobel-prize-winner/
Couldn’t we extend this to longitudinal settings and just say MSMs are better than the parametric g-formula if the models for the latter are too complex? Would this not render the strong likelihood principle false? If you don’t think causal inference problems are in the “right magisterium” for the likelihood principle, just consider missing data problems instead (same issues arise, in fact their counterexample is phrased as missing data).
This is an interesting counterexample, and I agree with Larry that using priors which depend on pi(x) is really no Bayesian solution at all. But if this example is really so problematic for Bayesian inference, can one give an explicit example of some function theta(x) for which no reasonable Bayesian prior is consistent? I would guess that only extremely pathological and unrealistic examples theta(x) would cause trouble for Bayesians. What I notice about many of these “Bayesian non-consistency” examples is that they require consistency over very large function classes: hence they shouldn’t really scare a subjective Bayesian who knows that any function you might encounter in the real world would be much better behaved.
In terms of practicality, it’s certainly inconvenient to have to compute a non-parametric posterior just to do inference on a single real parameter phi. To me, the two practical aspects of actually specifying priors and actually computing the posterior remain the only real weakness of the subjective Bayesian approach (or the Likelihood principle more generally.)
PS: Perhaps it’s worth discussing this example as its own thread.
It’s not obvious to me that they got the Bayesian analysis right in that blog post. If you can have “no observation” for Y, it seems like what we actually observe is some Y’ that can take on the values {0,1,null}, and the probability distribution over our observations of the variables (X,R,Y’) is p(X) * P(R|X) * P(Y’|X,R).
EDIT: Never mind, it’s not a problem. Even if it was, it wouldn’t have changed their case that the Bayesian update won’t give you this “uniform consistency” property. Which seems like something worth looking into.
As for this “low information” bull-hockey, let us put a MML prior over theta(x) and never speak of it again.