when the Data Generating Model (DGM) is not
one of the component models in the ensemble, BMA tends to
converge to the model closest to the DGM rather than to the
combination closest to the DGM [9]. He also empirically
noted that, in the cases he studied, when the DGM is not
one of the component models of an ensemble, there usually
existed a combination of models that could more closely
replicate the behavior of the DMG than could any individual
model on their own.
Versus my
if a combination is a better model, either because the true process is a superposition, or we are modelling something outside of our model-space, then a combination will be better able to express it. So mutex assumption will be forced to put all weight on a bad nearby theory,
Yup. Exactly what I thought.
Versus my