I’m curious what you make of the following argument.
When an infinite sequence is sampled from a true model μ0, there is likely to be another treacherous model μ1 which is likely to end up with greater posterior weight than an honest model ^μ0, and greater still than the posterior on the true model μ0.
If the sequence were sampled from μ1 instead, the eventual posterior weight on μ1 will probably be at least as high.
When an infinite sequence is sampled from a true model μ1, there is likely to be another treacherous model μ2, which is likely to end up with greater posterior weight than an honest model ^μ1, and greater still than the posterior on the true model μ1.
The first treacherous model works by replacing the bad simplicity prior with a better prior, and then using the better prior to more quickly infer the true model. No reason for the same thing to happen a second time.
(Well, I guess the argument works if you push out to longer and longer sequence lengths—a treacherous model will beat the true model on sequence lengths a billion, and then for sequence lengths a trillion a different treacherous model will win, and for sequence lengths a quadrillion a still different treacherous model will win. Before even thinking about the fact that each particular treacherous model will in fact defect at some point and at that point drop out of the posterior.)
Does it make sense to talk about ˇμ1, which is like μ1 in being treacherous, but is uses the true model μ0 instead of the honest model ^μ0? I guess you would expect ˇμ1 to have a lower posterior than μ0?
This is a bit of a sidebar:
I’m curious what you make of the following argument.
When an infinite sequence is sampled from a true model μ0, there is likely to be another treacherous model μ1 which is likely to end up with greater posterior weight than an honest model ^μ0, and greater still than the posterior on the true model μ0.
If the sequence were sampled from μ1 instead, the eventual posterior weight on μ1 will probably be at least as high.
When an infinite sequence is sampled from a true model μ1, there is likely to be another treacherous model μ2, which is likely to end up with greater posterior weight than an honest model ^μ1, and greater still than the posterior on the true model μ1.
And so on.
The first treacherous model works by replacing the bad simplicity prior with a better prior, and then using the better prior to more quickly infer the true model. No reason for the same thing to happen a second time.
(Well, I guess the argument works if you push out to longer and longer sequence lengths—a treacherous model will beat the true model on sequence lengths a billion, and then for sequence lengths a trillion a different treacherous model will win, and for sequence lengths a quadrillion a still different treacherous model will win. Before even thinking about the fact that each particular treacherous model will in fact defect at some point and at that point drop out of the posterior.)
Does it make sense to talk about ˇμ1, which is like μ1 in being treacherous, but is uses the true model μ0 instead of the honest model ^μ0? I guess you would expect ˇμ1 to have a lower posterior than μ0?