“Usefulness” certainly isn’t the orthodox Bayesian phrasing. I call myself a Bayesian because I recognize that Bayes’s Rule is the right thing to use in these situations. Whether or not the probabilities assigned to hypotheses “actually are” probabilities (whatever that means), they should obey the same mathematical rules of calculation as probabilities.
But precisely because only the manipulation rules matter, I’m not sure it is worth emphasizing that “to be a good Bayesian” you must accord these probabilities the same status as other probabilities. A hardcore Frequentist is not going to be comfortable doing that. Heck, I’m not sure I’m comfortable doing that. Data and event probabilities are things that can eventually be “resolved” to true or false, by looking after the fact. Probability as plausibility makes sense for these things.
But for hypotheses and models, I ask myself “plausibility of what? Being true?” Almost certainly, the “real” model (when that even makes sense) isn’t in our space of models. For example, a common, almost necessary, assumption is exchangeability: that any given permutation of the data is equally likely—effectively that all data points are drawn from the same distribution. Data often doesn’t behave like that, instead having a time drift. Coins being tossed develop wear, cards being shuffled and dealt get bent.
I really do prefer to think of some models being more or less useful. Of course, following this path shades into decision theory: we might want to assign priors according to how “tractable” the models are, including both in specification (stupid models that just specify what the data will be take lots of specification, so should have lower initial probabilities). Models that take longer to compute data probabilities should similarly have a probability penalty, not simply because they’re implausible, but because we don’t want to use them unless the data force us to.
...shades into decision theory...Models that take longer to compute data probabilities should similarly have a probability penalty, not simply because they’re implausible, but because we don’t want to use them unless the data force us to.
Whoa! that sounds dangerous! Why not keep the beliefs and costs separate and only apply this penalty at the decision theory stage?
Well, I said shaded into the lines of decision theory...
Yes, it absolutely is dangerous, and thinking about it more I agree it should not be done this way. Probability penalties do not scale correctly with the data collected: they’re essentially just a fixed offset. Modified utility of using a particular method really is different. If a method is unusable, we shouldn’t use it, and methods that trade off accuracy for manageability should be decided at that level, once we can judge the accuracy—not earlier.
EDIT: I suppose I was hoping for a valid way of justifying the fact that we throw out models that are too hard to use or analyze—they never make it into our set of hypotheses in the first place. It’s amazing how often conjugate priors “just happen” to be chosen...
But for hypotheses and models, I ask myself “plausibility of what? Being true?”
Plausibility of being true given the prior information. Just as Aristotelian logic gives valid arguments (but not necessarily sound ones), Bayes’s theorem gives valid but not necessarily sound plausibility assessments.
following this path shades into decision theory
That’s pretty much why I wanted to make the distinction between plausibility and usefulness. One of the things I like about the Cox-Jaynes approach is that it cleanly splits inference and decision-making apart.
Plausibility of being true given the prior information.
Okay, sure we can go back to the Bayesian mantra of “all probabilities are conditional probabilities”. But our prior information effectively includes the statement that one of our models is the “true one”. And that’s never the actual case, so our arguments are never sound in this sense, because we are forced to work from prior information that isn’t true. This isn’t a huge problem, but it in some sense undermines the motivation for finding these probabilities and treating them seriously—they’re conditional probabilities being applied in a case where we know that what is being conditioned on is false. What is the grounding to our actual situation? I like to take the stance that in practice this is still useful—as an approximation procedure—sorting through models that are approximately right.
And that’s never the actual case, so our arguments are never sound in this sense, because we are forced to work from prior information that isn’t true.
One does generally resort to non-Bayesian model checking methods. Andrew Gelman likes to include such checks under the rubric of “Bayesian data analysis”; he calls the computing of posterior probabilities and densities “Bayesian inference”, a preceding subcomponent of Bayesian data analysis. This makes for sensible statistical practice, but the underpinnings aren’t strong. One might consider it an attempt to approximate the Solomonoff prior.
Yes, in practice people resort to less motivated methods that work well.
I’d really like to see some principled answer that has the same feel as Bayesianism though. As it stands, I have no problem using Bayesian methods for parameter estimation. This is natural because we really are getting pdf(parameters | data, model). But for model selection and evaluation (i.e. non-parametric Bayes) I always feel that I need an “escape hatch” to include new models that the Bayes formalism simply doesn’t have any place for.
Models that take longer to compute data probabilities should similarly have a probability penalty, not simply because they’re implausible, but because we don’t want to use them unless the data force us to.
I am much more comfortable leaving probability as it is but using a different term for usefulness.
“Usefulness” certainly isn’t the orthodox Bayesian phrasing. I call myself a Bayesian because I recognize that Bayes’s Rule is the right thing to use in these situations. Whether or not the probabilities assigned to hypotheses “actually are” probabilities (whatever that means), they should obey the same mathematical rules of calculation as probabilities.
But precisely because only the manipulation rules matter, I’m not sure it is worth emphasizing that “to be a good Bayesian” you must accord these probabilities the same status as other probabilities. A hardcore Frequentist is not going to be comfortable doing that. Heck, I’m not sure I’m comfortable doing that. Data and event probabilities are things that can eventually be “resolved” to true or false, by looking after the fact. Probability as plausibility makes sense for these things.
But for hypotheses and models, I ask myself “plausibility of what? Being true?” Almost certainly, the “real” model (when that even makes sense) isn’t in our space of models. For example, a common, almost necessary, assumption is exchangeability: that any given permutation of the data is equally likely—effectively that all data points are drawn from the same distribution. Data often doesn’t behave like that, instead having a time drift. Coins being tossed develop wear, cards being shuffled and dealt get bent.
I really do prefer to think of some models being more or less useful. Of course, following this path shades into decision theory: we might want to assign priors according to how “tractable” the models are, including both in specification (stupid models that just specify what the data will be take lots of specification, so should have lower initial probabilities). Models that take longer to compute data probabilities should similarly have a probability penalty, not simply because they’re implausible, but because we don’t want to use them unless the data force us to.
Whoa! that sounds dangerous! Why not keep the beliefs and costs separate and only apply this penalty at the decision theory stage?
Well, I said shaded into the lines of decision theory...
Yes, it absolutely is dangerous, and thinking about it more I agree it should not be done this way. Probability penalties do not scale correctly with the data collected: they’re essentially just a fixed offset. Modified utility of using a particular method really is different. If a method is unusable, we shouldn’t use it, and methods that trade off accuracy for manageability should be decided at that level, once we can judge the accuracy—not earlier.
EDIT: I suppose I was hoping for a valid way of justifying the fact that we throw out models that are too hard to use or analyze—they never make it into our set of hypotheses in the first place. It’s amazing how often conjugate priors “just happen” to be chosen...
Plausibility of being true given the prior information. Just as Aristotelian logic gives valid arguments (but not necessarily sound ones), Bayes’s theorem gives valid but not necessarily sound plausibility assessments.
That’s pretty much why I wanted to make the distinction between plausibility and usefulness. One of the things I like about the Cox-Jaynes approach is that it cleanly splits inference and decision-making apart.
Okay, sure we can go back to the Bayesian mantra of “all probabilities are conditional probabilities”. But our prior information effectively includes the statement that one of our models is the “true one”. And that’s never the actual case, so our arguments are never sound in this sense, because we are forced to work from prior information that isn’t true. This isn’t a huge problem, but it in some sense undermines the motivation for finding these probabilities and treating them seriously—they’re conditional probabilities being applied in a case where we know that what is being conditioned on is false. What is the grounding to our actual situation? I like to take the stance that in practice this is still useful—as an approximation procedure—sorting through models that are approximately right.
One does generally resort to non-Bayesian model checking methods. Andrew Gelman likes to include such checks under the rubric of “Bayesian data analysis”; he calls the computing of posterior probabilities and densities “Bayesian inference”, a preceding subcomponent of Bayesian data analysis. This makes for sensible statistical practice, but the underpinnings aren’t strong. One might consider it an attempt to approximate the Solomonoff prior.
Yes, in practice people resort to less motivated methods that work well.
I’d really like to see some principled answer that has the same feel as Bayesianism though. As it stands, I have no problem using Bayesian methods for parameter estimation. This is natural because we really are getting pdf(parameters | data, model). But for model selection and evaluation (i.e. non-parametric Bayes) I always feel that I need an “escape hatch” to include new models that the Bayes formalism simply doesn’t have any place for.
I feel the same way.
I am much more comfortable leaving probability as it is but using a different term for usefulness.