>But the greatest merit of Occamian prior is that it vaguely resembles the Lazy prior.
...
>With that in mind, I asked what prior would serve this purpose even better and arrived at Lazy prior. The idea of encoding these considerations in a prior may seem like an error of some kind, but the choice of a prior is subjective by definition, so it should be fine.
Encoding convenience * probability into some kind of pseudo-prior such that the expected-utility maximizer is the maximum likelihood model with respect to the pseudo-prior does seem like a really useful computational trick, and you are right that terminology should reflect this. And you are right that the Occam prior has the nice property that weight-by-bit-count is often close to convenience, and hence makes the wrong naive approach somewhat acceptable in practice: That is, just taking the max likelihood model with respect to bit-count should often be a good approx for weight-by-bitcount * convenience (which is the same as weight-by-bitcount for probability and maximize expected utility).
In cases where we know the utility we can regenerate probabilities afterwards. So I would now be really interested in some informal study of how well Occam actually performs in practice, after controlling for utility: You are right that the empirical success of Occam might be only due to the implicit inclusion of convenience (succinct-by-bit-count models are often convenient) when doing the (wrong!) max-likelihood inference. I had not considered this, so thanks also for your post; we both learned something today.
I’d also remark/reiterate the point in favor of the Lazy prior: The really terrible parts of working with Occam (short descriptions that are hard to reason about, aka halting problem) get cancelled out in the utility maximization anyway. Lazy avoids invoking the halting-problem oracle in your basement for computing these terms (where we have the main differences between Occam vs Lazy). So you are right after all: Outside of theoretical discussion we should all stop using probabilities and Occam and switch to some kind of Lazy pseudo-prior. Thanks!
That being said, we all appear to agree that Occam is quite nice as an abstract tool, even if somewhat naive in practice.
A different point in favor of Occam is “political objectivity”: It is hard to fudge in motivated reasoning. Just like the “naive frequentist” viewpoint sometimes wins over Bayes with respect to avoiding politically charged discussions of priors, Occam defends against “witchcraft appears natural to my mind, and the historical record suggests that humans have evolved hardware acceleration for reasoning about witchcraft; so, considering Lazy-prior, we conclude that witches did it” (Occam + utility maximization rather suggests the more palatable formulation “hence it is useful to frame these natural processes in terms of Moloch, Azatoth and Cthulhu battling it out”, which ends up with the same intuitions and models but imho better mental hygiene)
>But the greatest merit of Occamian prior is that it vaguely resembles the Lazy prior.
...
>With that in mind, I asked what prior would serve this purpose even better and arrived at Lazy prior. The idea of encoding these considerations in a prior may seem like an error of some kind, but the choice of a prior is subjective by definition, so it should be fine.
Encoding convenience * probability into some kind of pseudo-prior such that the expected-utility maximizer is the maximum likelihood model with respect to the pseudo-prior does seem like a really useful computational trick, and you are right that terminology should reflect this. And you are right that the Occam prior has the nice property that weight-by-bit-count is often close to convenience, and hence makes the wrong naive approach somewhat acceptable in practice: That is, just taking the max likelihood model with respect to bit-count should often be a good approx for weight-by-bitcount * convenience (which is the same as weight-by-bitcount for probability and maximize expected utility).
In cases where we know the utility we can regenerate probabilities afterwards. So I would now be really interested in some informal study of how well Occam actually performs in practice, after controlling for utility: You are right that the empirical success of Occam might be only due to the implicit inclusion of convenience (succinct-by-bit-count models are often convenient) when doing the (wrong!) max-likelihood inference. I had not considered this, so thanks also for your post; we both learned something today.
I’d also remark/reiterate the point in favor of the Lazy prior: The really terrible parts of working with Occam (short descriptions that are hard to reason about, aka halting problem) get cancelled out in the utility maximization anyway. Lazy avoids invoking the halting-problem oracle in your basement for computing these terms (where we have the main differences between Occam vs Lazy). So you are right after all: Outside of theoretical discussion we should all stop using probabilities and Occam and switch to some kind of Lazy pseudo-prior. Thanks!
That being said, we all appear to agree that Occam is quite nice as an abstract tool, even if somewhat naive in practice.
A different point in favor of Occam is “political objectivity”: It is hard to fudge in motivated reasoning. Just like the “naive frequentist” viewpoint sometimes wins over Bayes with respect to avoiding politically charged discussions of priors, Occam defends against “witchcraft appears natural to my mind, and the historical record suggests that humans have evolved hardware acceleration for reasoning about witchcraft; so, considering Lazy-prior, we conclude that witches did it” (Occam + utility maximization rather suggests the more palatable formulation “hence it is useful to frame these natural processes in terms of Moloch, Azatoth and Cthulhu battling it out”, which ends up with the same intuitions and models but imho better mental hygiene)