It seems to me like there are two distinct issues: estimating error of model on future data and model comparison.
1⟼ It would be useful to know the most likely value of error on an future data before we actually use the model; but is this what test set error represents—the most likely value of error on future data?
2⟼ Why do we use techniques like WAIC and PSIS-LOO when we can (and should?) simply use p(M|D) i.e. Bayes factors, Ockham factors, Model Evidence, etc.? These techniques seem to work well for over-fitting (see image below). Once we find the more plausible model, we use it to make predictions
It seems to me like there are two distinct issues: estimating error of model on future data and model comparison.
1⟼ It would be useful to know the most likely value of error on an future data before we actually use the model; but is this what test set error represents—the most likely value of error on future data?
2⟼ Why do we use techniques like WAIC and PSIS-LOO when we can (and should?) simply use p(M|D) i.e. Bayes factors, Ockham factors, Model Evidence, etc.? These techniques seem to work well for over-fitting (see image below). Once we find the more plausible model, we use it to make predictions