Ignoring the practical problems of Bayesian model averaging, isn’t assuming that either M1, M2, or M3 is true better than assuming that some model M is true? So Bayesian model averaging is always better right (if it is practically possible)?
If there are 3 competing models then Ideally you can make a larger model where each submodel is realized by specific parameter combinations.
If a M2 is simply M1 with an extra parameter b2, then you should have a stronger prior b2 being zero in M2, if M3 is M1 with one parameter transformed, then you should have a parameter interpolating between this transformation so you can learn that between 40-90% interpolating describe the data better.
If it’s impossible to translate between models like this then you can do model averaging, but it’s a sign of you not understanding your data.
Yes, this is usually the right approach—use a single, more complex, model that has the various models you were considering as special cases. It’s likely that the best parameters of this extended model won’t actually turn out to be one of the special cases. (But note that this approach doesn’t necessarily eliminate the need for careful consideration of the prior, since unwise priors for a single complex model can also cause problems.)
However, there are some situations where discrete models make sense. For instance, you might be analysing old Roman coins, and be unsure whether they were all minted in one mint, or in two (or three, …) different mints. There aren’t really any intermediate possibilities between one mint or two. Or you might be studying inheritance of two genes, and be considering two models in which they are either on the same chromosome or on different chromosones.
If you’re thinking of a stick-breaking prior such as a Dirichlet process mixture model, they typically produce an infinite number of components (which would be mints, in this case), though of course only a finite number will be represented in your finite data set. But we know that the number of mints producing coins in the Roman Empire was finite. So that’s not a reasonable prior (though of course you might sometimes be able to get away with using it anyway).
Ignoring the practical problems of Bayesian model averaging, isn’t assuming that either M1, M2, or M3 is true better than assuming that some model M is true? So Bayesian model averaging is always better right (if it is practically possible)?
If there are 3 competing models then Ideally you can make a larger model where each submodel is realized by specific parameter combinations.
If a M2 is simply M1 with an extra parameter b2, then you should have a stronger prior b2 being zero in M2, if M3 is M1 with one parameter transformed, then you should have a parameter interpolating between this transformation so you can learn that between 40-90% interpolating describe the data better.
If it’s impossible to translate between models like this then you can do model averaging, but it’s a sign of you not understanding your data.
Yes, this is usually the right approach—use a single, more complex, model that has the various models you were considering as special cases. It’s likely that the best parameters of this extended model won’t actually turn out to be one of the special cases. (But note that this approach doesn’t necessarily eliminate the need for careful consideration of the prior, since unwise priors for a single complex model can also cause problems.)
However, there are some situations where discrete models make sense. For instance, you might be analysing old Roman coins, and be unsure whether they were all minted in one mint, or in two (or three, …) different mints. There aren’t really any intermediate possibilities between one mint or two. Or you might be studying inheritance of two genes, and be considering two models in which they are either on the same chromosome or on different chromosones.
Good points, but can’t you still solve the discrete problem with a single model and a stick breaking prior on the number of mints, right?
If you’re thinking of a stick-breaking prior such as a Dirichlet process mixture model, they typically produce an infinite number of components (which would be mints, in this case), though of course only a finite number will be represented in your finite data set. But we know that the number of mints producing coins in the Roman Empire was finite. So that’s not a reasonable prior (though of course you might sometimes be able to get away with using it anyway).
Ahhh… that makes a lot of sense. ↗Thank you!↖