Jan Christian Refsgaard comments on Do Bayesians like Bayesian model Averaging?

Jan Christian Refsgaard 3 Aug 2021 4:35 UTC
2 points
I am a little confused by what x is on your statement, and by why you think we can’t compute the likelihood or posterior predictive. In most real problems we can’t compute the posterior but we can draw from it and thus approximate it via MCMC
- Haziq Muhammad 3 Aug 2021 11:24 UTC
  2 points
  Parent
  Sorry! Bad notation… What I meant was that we can’t compute the conditional posterior predictive density $p (~ y | ~ x, D)$ where $D = {(x_{1}, y_{1}), \dots, (x_{n}, y_{n})}$ . We can compute $p (~ y | ~ x, D, M)$ , where $M$ is some model, approximately using MCMC by drawing samples from the parameter space of $M$ , i.e. we can approximate the integral below using MCMC:
  $p (~ y | ~ x, D, M) = \int_{θ \in Θ} p (~ y | ~ x, M, θ) p (θ | M, D) d θ$
  
  where $Θ$ is the parameter space of $M$ . But the quantity that we are interested in is
  $p (~ y | ~ x, D)$ not $p (~ y | ~ x, D, M)$ for a specific model i.e. we need to marginalise over the unknown model. How can we do this?
  - Jan Christian Refsgaard 3 Aug 2021 20:54 UTC
    2 points
    Parent
    You are correct, we have to assume a model, just like we have to assume a prior. And strictly speaking the model is wrong and the prior is wrong :). But we can calculate how good the posterior predictive describe the data to get a feel for how bad our model is :)
    - Haziq Muhammad 4 Aug 2021 2:25 UTC
      2 points
      Parent
      Ignoring the practical problems of Bayesian model averaging, isn’t assuming that either M1, M2, or M3 is true better than assuming that some model M is true? So Bayesian model averaging is always better right (if it is practically possible)?
      - Jan Christian Refsgaard 4 Aug 2021 4:17 UTC
        2 points
        Parent
        If there are 3 competing models then Ideally you can make a larger model where each submodel is realized by specific parameter combinations.
        
        If a M2 is simply M1 with an extra parameter b2, then you should have a stronger prior b2 being zero in M2, if M3 is M1 with one parameter transformed, then you should have a parameter interpolating between this transformation so you can learn that between 40-90% interpolating describe the data better.
        
        If it’s impossible to translate between models like this then you can do model averaging, but it’s a sign of you not understanding your data.
        Radford Neal 4 Aug 2021 19:19 UTC
        2 points
        Parent
        Yes, this is usually the right approach—use a single, more complex, model that has the various models you were considering as special cases. It’s likely that the best parameters of this extended model won’t actually turn out to be one of the special cases. (But note that this approach doesn’t necessarily eliminate the need for careful consideration of the prior, since unwise priors for a single complex model can also cause problems.)
        However, there are some situations where discrete models make sense. For instance, you might be analysing old Roman coins, and be unsure whether they were all minted in one mint, or in two (or three, …) different mints. There aren’t really any intermediate possibilities between one mint or two. Or you might be studying inheritance of two genes, and be considering two models in which they are either on the same chromosome or on different chromosones.
        Jan Christian Refsgaard 5 Aug 2021 20:50 UTC
        1 point
        Parent
        Good points, but can’t you still solve the discrete problem with a single model and a stick breaking prior on the number of mints, right?
        Radford Neal 5 Aug 2021 21:16 UTC
        2 points
        Parent
        If you’re thinking of a stick-breaking prior such as a Dirichlet process mixture model, they typically produce an infinite number of components (which would be mints, in this case), though of course only a finite number will be represented in your finite data set. But we know that the number of mints producing coins in the Roman Empire was finite. So that’s not a reasonable prior (though of course you might sometimes be able to get away with using it anyway).
        Haziq Muhammad 4 Aug 2021 4:35 UTC
        2 points
        Parent
        Ahhh… that makes a lot of sense. $↗ T h a n k y o u! ↖$