Viktor Rehnberg comments on Occam’s Razor

Viktor Rehnberg Jul 1, 2020, 10:09 AM
10 points
0
My own way of thinking of Occam’s Razor is through model selection. Suppose you have two competing statements $H_{1}$ (the which did it) and $H_{2}$ (it was chance or possibly something other than a which caused it ( $H_{2} = \neg H_{1}$ )) and some observations $D$ (the sequence came up 0101010101). Then the preferred statement is whichever is more probable calculated as
$p (H | D) = \frac{p (H) p (D | H)}{p (D)}$
this is simply Bayes rule where
$p (D | H) = \int_{θ} p (D | θ, H) p (θ | H) d θ$
and the model is parametrized by some parameters $θ$ .
Now all this is just the mathematical way of writing that a hypothesis that has more parameters (or more specifically more possible values that it predicts), will not be as strong a statement that predicts a smaller state of outcomes.
In the witch example this would be:
- $H_{1} =$ There exist an advanced intelligent being (at least not much less than human intelligence) that can do things beyond what has ever been reproduced in a scientific way that for some reason chooses to live on our street and act mostly as a human that will choose to influence my sequence coin tosses to end up in some seemingly looking pattern
- $H_{2} =$ The coin toss is ruled by chance and might end up in the set of possible outcomes that seem to form a pattern ( ${0101010101, 1111100000, 1100110011, \dots}$ )
- $D =$ The coin toss ended up as $0101010101$
- The way I stated the hypotheses $p (D | H_{1}) = p (D | H_{2}) \cdot Fraction of outcomes that look like a pattern$
Now what remains is to estimate the priors and the the fraction of outcomes that look like a pattern. We can skip $p (D)$ as we are interested in $p (H_{1} | D) : p (H_{2} | D)$ .
Now comparing the amount of conditionals in the hypotheses and how surprised I am by them I would roughly estimate a ratio of the priors as something like $2^{100}$ in favor to chance, as the witch hypothesis goes against many of my formed beliefs of the world collected over many years, it includes weird choices of living for this hypothetical alien entity, it picks out me as a possible agent of many in the neighborhood, it singles out an arbitrary action of mine and an arbitrary set of outcomes.
For the sake of completeness. The fraction of outcomes that look like a pattern is kind of hard to estimate exactly. However, my way of thinking about it is how soon in the sequence would I postulate the specific sequence that it ended up in. After 0101, I think that the sequence 0101010101 is the most obvious pattern to continue it in. So roughly this is six bits of evidence.
In conclusion, I would say that the probability of the witch hypothesis is lacking around 94 bits of evidence for me to believe it as much as the chance hypothesis.
The downside of this approach to the Solomonoff induction and the minimum message length is that it is clunkier to use and it might be easy to forget to include conditionals or complexity in the priors the same way they can be lost in the English language. The upside is that as a model it is simpler, less ad hoc and builds directly on the product rule in probability and that probabilities sum to one and should thus be preferred by Occam’s Razor ;).