[deleted] comments on Model Combination and Adjustment

[deleted] 14 Jul 2013 5:17 UTC
9 points
It’s an interesting exercise to look for the Bayes structure in this (and other) advice.

At least I find it helpful to tie things down to the underlying theory. Otherwise I find it easy to misinterpret things.

Good article.
- lukeprog 14 Jul 2013 5:33 UTC
  23 points
  Parent
  
  It’s an interesting exercise to look for the Bayes structure in this (and other) advice.
  
  Yup! Practical advice is best when it’s backed by deep theories.
  
  Monteith et al. (2011) (linked in the OP) is an interesting read on the subject. They discuss a puzzle: why does the theoretically optimal Bayesian method for dealing with multiple models (that is, Bayesian model averaging) tend to underperform ad-hoc methods (e.g. “bagging” and “boosting”) in empirical tests? It turns out that “Bayesian model averaging struggles in practice because it accounts for uncertainty about which model is correct but still operates under the assumption that only one of them is.” The solution is simply modify the Bayesian model averaging process so that it integrates over combinations of models rather than over individual models. (They call this Bayesian model combination, to distinguish it from “normal” Bayesian model averaging.) In their tests, Bayesian model combination beats out bagging, boosting, and “normal” Bayesian model averaging.
  - [deleted] 14 Jul 2013 15:54 UTC
    15 points
    Parent
    Bayesian model averaging struggles in practice because it accounts for uncertainty about which model is correct but still operates under the assumption that only one of them is.
    
    Wait, what? That sounds significant. What does more than one model being correct mean?
    
    Speculation before I read the paper:
    
    I guess that’s like modelling a process as the superposition of sub-processes? That would give the model more degrees of freedom with which to fit the data. Would we expect that to do strictly better than the mutual exclusion assumption, or does it require more data to overcome the degrees of freedom?
    
    If a single theory is correct, the mutex assumption will update toward it faster by giving it a higher prior, and the probability-distribution-over-averages would get there slower, but still assigns a substantial prior to theories close to the true one.
    
    On the other hand, if a combination is a better model, either because the true process is a superposition, or we are modelling something outside of our model-space, then a combination will be better able to express it. So mutex assumption will be forced to put all weight on a bad nearby theory, effectively updating in the wrong direction, whereas the combination won’t lose as much because it contains more accurate models. I wonder if averaging combination will beat mutex assumption at every step?
    
    Also interesting to note that the mutex assumption is a subset of the model space of the combination assumption, so if you are unsure which is correct, you can just add more weight to the mutex models in the combination prior and use that.
    
    Now I’ll read the paper. Let’s see how I did.
    - [deleted] 14 Jul 2013 17:16 UTC
      14 points
      Parent
      Yup. Exactly what I thought.
      
      when the Data Generating Model (DGM) is not one of the component models in the ensemble, BMA tends to converge to the model closest to the DGM rather than to the combination closest to the DGM [9]. He also empirically noted that, in the cases he studied, when the DGM is not one of the component models of an ensemble, there usually existed a combination of models that could more closely replicate the behavior of the DMG than could any individual model on their own.
      
      Versus my
      
      if a combination is a better model, either because the true process is a superposition, or we are modelling something outside of our model-space, then a combination will be better able to express it. So mutex assumption will be forced to put all weight on a bad nearby theory,
    - fractalman 14 Jul 2013 20:45 UTC
      2 points
      Parent
      “What does more than one model being correct mean?”
      
      maybe something like string theory? The 5 lesser theories look totally different...and then turn out to tranform into one another when you fiddle with the coupling constant.
      - A1987dM 14 Jul 2013 22:56 UTC
        4 points
        Parent
        Seeing the words “string” and “fiddle” on top of each other primed me to think of their literal meanings, which I wouldn’t otherwise consciously thought of.
  - CronoDAS 15 Jul 2013 7:41 UTC
    2 points
    Parent
    
    “Bayesian model averaging struggles in practice because it accounts for uncertainty about which model is correct but still operates under the assumption that only one of them is.”
    
    Perhaps they should say “the assumption that exactly one model is perfectly correct”?