That can help in some instances, but it won’t work for everything. In particular, if the problem contains lots of parameters, some of which are of substantive interest and the rest of which are necessary for accurate modelling but are otherwise nuisances, then useful likelihood ratios don’t exist.
In what cases can “95% statistical significance” be useful while appropriately selected and specified likelihood ratios can not be similarly useful? (Essentially I do not believe you.)
Let me clarify that I’m not defending the notion of statistical significance in data analysis—I’m merely saying that the advice to publish likelihood ratios is not a complete answer for avoiding debate over priors.
I analyzed some data using two versions of a model that had ~6000 interest parameters and ~6000 nuisance parameters. One of my goals was to determine which version was more appropriate for the problem. The strict Bayesian answer is to compare different possible models using the Bayes factor, which marginalizes over every parameter in each version of the model with respect to each version’s prior. Likelihood ratios are no help here.
It turned out to be a lot easier and more convincing to do a simple residual plot for each version of the model. For one version the residual plot matched the model’s assumptions about the error distribution; for the other, it didn’t. This is a kind of self-consistency check: passing it doesn’t mean that the model is adequate, but failing it definitely means the model is not adequate.
(BTW, the usual jargon goes, “statistically significant at the 0.05 / 0.01 / 0.001 level.”)
A large likelihood ratio? I have two likelihood functions—at what values of the parameter arguments should I evaluate them when forming the ratio? Given that one of the versions is nested in the other at the boundary of the parameter space (Gaussian errors versus Student-t errors with degrees of freedom fit to the data), what counts as a large enough likelihood ratio to prefer the more general version of the model?
In what cases can “95% statistical significance” be useful while appropriately selected and specified likelihood ratios can not be similarly useful? (Essentially I do not believe you.)
Let me clarify that I’m not defending the notion of statistical significance in data analysis—I’m merely saying that the advice to publish likelihood ratios is not a complete answer for avoiding debate over priors.
I analyzed some data using two versions of a model that had ~6000 interest parameters and ~6000 nuisance parameters. One of my goals was to determine which version was more appropriate for the problem. The strict Bayesian answer is to compare different possible models using the Bayes factor, which marginalizes over every parameter in each version of the model with respect to each version’s prior. Likelihood ratios are no help here.
It turned out to be a lot easier and more convincing to do a simple residual plot for each version of the model. For one version the residual plot matched the model’s assumptions about the error distribution; for the other, it didn’t. This is a kind of self-consistency check: passing it doesn’t mean that the model is adequate, but failing it definitely means the model is not adequate.
(BTW, the usual jargon goes, “statistically significant at the 0.05 / 0.01 / 0.001 level.”)
Your ability to distinguish them that way means that there was a large likelihood ratio from the evidence.
A large likelihood ratio? I have two likelihood functions—at what values of the parameter arguments should I evaluate them when forming the ratio? Given that one of the versions is nested in the other at the boundary of the parameter space (Gaussian errors versus Student-t errors with degrees of freedom fit to the data), what counts as a large enough likelihood ratio to prefer the more general version of the model?
Likelihood ratios are computed at a single point in parameter space. P values are summary values computed over part of the parameter space.