I have a feeling that you mix probability and decision theory. Given some observations, there are two separate questions when considering possible explanations / models:
1. What probability to assign to each model?
2. Which model to use?
Now, our toy-model of perfect rationality would use some prior, e.g. the bit-counting universal/kolmogorov/occam one, and bayesian update to answer (1), i.e. compute the posterior distribution. Then, it would weight these models by “convenience of working with them”, which goes into our expected utility maximization for answering (2), since we only have finite computational resources after all. In many cases we will be willing to work with known wrong-but-pretty-good models like Newtonian gravity, just because they are so much more convenient and good enough.
I have a feeling that you correctly intuit that convenience should enter the question which model to adopt, but misattribute this into the probability—but which model to adopt should formally be bayesian update + utility maximization (taking convenience and bounded computational resources into account), and definitely not “Bayesian update only”, which leads you to the (imho questionable) conclusion that the universal / kolmogorov / occam prior is flawed for computing probability.
On the other hand, you are right that the above toy model of perfect rationality is computationally bad: Computing the posterior distribution after some prior and then weighting by utility/convenience is of stupid if directly computing prior * convenience is cheaper than computing prior and convenience separately and then multiplying. More generally, probability is a nice concept for human minds to reason about reasoning, but we ultimately care about decision theory only.
Always combining probability and utility might be a more correct model, but it is often conceptually more complex to my mind, which is why I don’t try to always adopt it ;)
You are correct that Lazy prior largely encodes considerations of utility maximization. My core point isn’t that Lazy prior is some profound idea.
Instead my core point is that the Occamian prior is not profound either. It has only a few real merits. One minor merit is that it is simple to describe and to reason about, which makes it a high-utility choice of a prior, at least for theoretical discussions.
But the greatest merit of Occamian prior is that it vaguely resembles the Lazy prior. That is, it also encodes some of the same considerations of utility maximization. I’m suggesting that, whenever someone talks about the power of Occam’s razor or the mysterious simplicity of nature, what is happening is in fact this: the person did not bother to do proper utility calculations, Occamian prior encoded some of those calculations by construction, and therefore the person managed to reach a high-utility result with less effort.
With that in mind, I asked what prior would serve this purpose even better and arrived at Lazy prior. The idea of encoding these considerations in a prior may seem like an error of some kind, but the choice of a prior is subjective by definition, so it should be fine.
(Thanks for the comment. I found it useful. I hadn’t explicitly considered this criticism when I wrote the post, and I feel that I now understand my own view better.)
>But the greatest merit of Occamian prior is that it vaguely resembles the Lazy prior.
...
>With that in mind, I asked what prior would serve this purpose even better and arrived at Lazy prior. The idea of encoding these considerations in a prior may seem like an error of some kind, but the choice of a prior is subjective by definition, so it should be fine.
Encoding convenience * probability into some kind of pseudo-prior such that the expected-utility maximizer is the maximum likelihood model with respect to the pseudo-prior does seem like a really useful computational trick, and you are right that terminology should reflect this. And you are right that the Occam prior has the nice property that weight-by-bit-count is often close to convenience, and hence makes the wrong naive approach somewhat acceptable in practice: That is, just taking the max likelihood model with respect to bit-count should often be a good approx for weight-by-bitcount * convenience (which is the same as weight-by-bitcount for probability and maximize expected utility).
In cases where we know the utility we can regenerate probabilities afterwards. So I would now be really interested in some informal study of how well Occam actually performs in practice, after controlling for utility: You are right that the empirical success of Occam might be only due to the implicit inclusion of convenience (succinct-by-bit-count models are often convenient) when doing the (wrong!) max-likelihood inference. I had not considered this, so thanks also for your post; we both learned something today.
I’d also remark/reiterate the point in favor of the Lazy prior: The really terrible parts of working with Occam (short descriptions that are hard to reason about, aka halting problem) get cancelled out in the utility maximization anyway. Lazy avoids invoking the halting-problem oracle in your basement for computing these terms (where we have the main differences between Occam vs Lazy). So you are right after all: Outside of theoretical discussion we should all stop using probabilities and Occam and switch to some kind of Lazy pseudo-prior. Thanks!
That being said, we all appear to agree that Occam is quite nice as an abstract tool, even if somewhat naive in practice.
A different point in favor of Occam is “political objectivity”: It is hard to fudge in motivated reasoning. Just like the “naive frequentist” viewpoint sometimes wins over Bayes with respect to avoiding politically charged discussions of priors, Occam defends against “witchcraft appears natural to my mind, and the historical record suggests that humans have evolved hardware acceleration for reasoning about witchcraft; so, considering Lazy-prior, we conclude that witches did it” (Occam + utility maximization rather suggests the more palatable formulation “hence it is useful to frame these natural processes in terms of Moloch, Azatoth and Cthulhu battling it out”, which ends up with the same intuitions and models but imho better mental hygiene)
I have a feeling that you mix probability and decision theory. Given some observations, there are two separate questions when considering possible explanations / models:
1. What probability to assign to each model?
2. Which model to use?
Now, our toy-model of perfect rationality would use some prior, e.g. the bit-counting universal/kolmogorov/occam one, and bayesian update to answer (1), i.e. compute the posterior distribution. Then, it would weight these models by “convenience of working with them”, which goes into our expected utility maximization for answering (2), since we only have finite computational resources after all. In many cases we will be willing to work with known wrong-but-pretty-good models like Newtonian gravity, just because they are so much more convenient and good enough.
I have a feeling that you correctly intuit that convenience should enter the question which model to adopt, but misattribute this into the probability—but which model to adopt should formally be bayesian update + utility maximization (taking convenience and bounded computational resources into account), and definitely not “Bayesian update only”, which leads you to the (imho questionable) conclusion that the universal / kolmogorov / occam prior is flawed for computing probability.
On the other hand, you are right that the above toy model of perfect rationality is computationally bad: Computing the posterior distribution after some prior and then weighting by utility/convenience is of stupid if directly computing prior * convenience is cheaper than computing prior and convenience separately and then multiplying. More generally, probability is a nice concept for human minds to reason about reasoning, but we ultimately care about decision theory only.
Always combining probability and utility might be a more correct model, but it is often conceptually more complex to my mind, which is why I don’t try to always adopt it ;)
You are correct that Lazy prior largely encodes considerations of utility maximization. My core point isn’t that Lazy prior is some profound idea.
Instead my core point is that the Occamian prior is not profound either. It has only a few real merits. One minor merit is that it is simple to describe and to reason about, which makes it a high-utility choice of a prior, at least for theoretical discussions.
But the greatest merit of Occamian prior is that it vaguely resembles the Lazy prior. That is, it also encodes some of the same considerations of utility maximization. I’m suggesting that, whenever someone talks about the power of Occam’s razor or the mysterious simplicity of nature, what is happening is in fact this: the person did not bother to do proper utility calculations, Occamian prior encoded some of those calculations by construction, and therefore the person managed to reach a high-utility result with less effort.
With that in mind, I asked what prior would serve this purpose even better and arrived at Lazy prior. The idea of encoding these considerations in a prior may seem like an error of some kind, but the choice of a prior is subjective by definition, so it should be fine.
(Thanks for the comment. I found it useful. I hadn’t explicitly considered this criticism when I wrote the post, and I feel that I now understand my own view better.)
>But the greatest merit of Occamian prior is that it vaguely resembles the Lazy prior.
...
>With that in mind, I asked what prior would serve this purpose even better and arrived at Lazy prior. The idea of encoding these considerations in a prior may seem like an error of some kind, but the choice of a prior is subjective by definition, so it should be fine.
Encoding convenience * probability into some kind of pseudo-prior such that the expected-utility maximizer is the maximum likelihood model with respect to the pseudo-prior does seem like a really useful computational trick, and you are right that terminology should reflect this. And you are right that the Occam prior has the nice property that weight-by-bit-count is often close to convenience, and hence makes the wrong naive approach somewhat acceptable in practice: That is, just taking the max likelihood model with respect to bit-count should often be a good approx for weight-by-bitcount * convenience (which is the same as weight-by-bitcount for probability and maximize expected utility).
In cases where we know the utility we can regenerate probabilities afterwards. So I would now be really interested in some informal study of how well Occam actually performs in practice, after controlling for utility: You are right that the empirical success of Occam might be only due to the implicit inclusion of convenience (succinct-by-bit-count models are often convenient) when doing the (wrong!) max-likelihood inference. I had not considered this, so thanks also for your post; we both learned something today.
I’d also remark/reiterate the point in favor of the Lazy prior: The really terrible parts of working with Occam (short descriptions that are hard to reason about, aka halting problem) get cancelled out in the utility maximization anyway. Lazy avoids invoking the halting-problem oracle in your basement for computing these terms (where we have the main differences between Occam vs Lazy). So you are right after all: Outside of theoretical discussion we should all stop using probabilities and Occam and switch to some kind of Lazy pseudo-prior. Thanks!
That being said, we all appear to agree that Occam is quite nice as an abstract tool, even if somewhat naive in practice.
A different point in favor of Occam is “political objectivity”: It is hard to fudge in motivated reasoning. Just like the “naive frequentist” viewpoint sometimes wins over Bayes with respect to avoiding politically charged discussions of priors, Occam defends against “witchcraft appears natural to my mind, and the historical record suggests that humans have evolved hardware acceleration for reasoning about witchcraft; so, considering Lazy-prior, we conclude that witches did it” (Occam + utility maximization rather suggests the more palatable formulation “hence it is useful to frame these natural processes in terms of Moloch, Azatoth and Cthulhu battling it out”, which ends up with the same intuitions and models but imho better mental hygiene)