Let me clarify that I’m not defending the notion of statistical significance in data analysis—I’m merely saying that the advice to publish likelihood ratios is not a complete answer for avoiding debate over priors.
I analyzed some data using two versions of a model that had ~6000 interest parameters and ~6000 nuisance parameters. One of my goals was to determine which version was more appropriate for the problem. The strict Bayesian answer is to compare different possible models using the Bayes factor, which marginalizes over every parameter in each version of the model with respect to each version’s prior. Likelihood ratios are no help here.
It turned out to be a lot easier and more convincing to do a simple residual plot for each version of the model. For one version the residual plot matched the model’s assumptions about the error distribution; for the other, it didn’t. This is a kind of self-consistency check: passing it doesn’t mean that the model is adequate, but failing it definitely means the model is not adequate.
(BTW, the usual jargon goes, “statistically significant at the 0.05 / 0.01 / 0.001 level.”)
A large likelihood ratio? I have two likelihood functions—at what values of the parameter arguments should I evaluate them when forming the ratio? Given that one of the versions is nested in the other at the boundary of the parameter space (Gaussian errors versus Student-t errors with degrees of freedom fit to the data), what counts as a large enough likelihood ratio to prefer the more general version of the model?
Let me clarify that I’m not defending the notion of statistical significance in data analysis—I’m merely saying that the advice to publish likelihood ratios is not a complete answer for avoiding debate over priors.
I analyzed some data using two versions of a model that had ~6000 interest parameters and ~6000 nuisance parameters. One of my goals was to determine which version was more appropriate for the problem. The strict Bayesian answer is to compare different possible models using the Bayes factor, which marginalizes over every parameter in each version of the model with respect to each version’s prior. Likelihood ratios are no help here.
It turned out to be a lot easier and more convincing to do a simple residual plot for each version of the model. For one version the residual plot matched the model’s assumptions about the error distribution; for the other, it didn’t. This is a kind of self-consistency check: passing it doesn’t mean that the model is adequate, but failing it definitely means the model is not adequate.
(BTW, the usual jargon goes, “statistically significant at the 0.05 / 0.01 / 0.001 level.”)
Your ability to distinguish them that way means that there was a large likelihood ratio from the evidence.
A large likelihood ratio? I have two likelihood functions—at what values of the parameter arguments should I evaluate them when forming the ratio? Given that one of the versions is nested in the other at the boundary of the parameter space (Gaussian errors versus Student-t errors with degrees of freedom fit to the data), what counts as a large enough likelihood ratio to prefer the more general version of the model?