The reference machine is chosen to be simple, usually by limiting its state x symbol complexity. That’s not perfect and various people have tried to come up with something better but as yet none of these efforts have succeeded. In terms of prediction it’s not a problem as Solomonoff’s predictor converges so amazingly fast—faster than 1/n where n is the number of bits of input data (this isn’t quite true and Li and Vitanyi don’t quite get it right either, see Hutter: On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384, 33-48, 2007 for the gory details). Anyway, the convergence is so fast that the compiler constant for a simple UTM (and we know they can be just a few hundred bits) quickly becomes irrelevant.
In terms of determining the generating semi-measure (“true hypothesis”) things are a bit more difficult. If two hypotheses assign the same probability to the observed data (same likelihood value) then the data will not cause the relative probability of these in the posterior to change, they will simply be scaled by the evidence factor. Of course this is no problem for prediction as they agree, but it does mean that Solomonoff induction is somewhat crude in deciding between different “interpretations”. In other words, if the data can’t help you decide then all you have left is the prior. The best we can do is to tie down the reference machine complexity.
In any case I think these things are a bit of a side issue. In my view, for anybody who doesn’t have a “we humans are special” bias, MWI is simpler in terms of description complexity because we don’t need to add a bunch of special cases and exceptions. Common sense be damned!
Various people:
The reference machine is chosen to be simple, usually by limiting its state x symbol complexity. That’s not perfect and various people have tried to come up with something better but as yet none of these efforts have succeeded. In terms of prediction it’s not a problem as Solomonoff’s predictor converges so amazingly fast—faster than 1/n where n is the number of bits of input data (this isn’t quite true and Li and Vitanyi don’t quite get it right either, see Hutter: On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384, 33-48, 2007 for the gory details). Anyway, the convergence is so fast that the compiler constant for a simple UTM (and we know they can be just a few hundred bits) quickly becomes irrelevant.
In terms of determining the generating semi-measure (“true hypothesis”) things are a bit more difficult. If two hypotheses assign the same probability to the observed data (same likelihood value) then the data will not cause the relative probability of these in the posterior to change, they will simply be scaled by the evidence factor. Of course this is no problem for prediction as they agree, but it does mean that Solomonoff induction is somewhat crude in deciding between different “interpretations”. In other words, if the data can’t help you decide then all you have left is the prior. The best we can do is to tie down the reference machine complexity.
In any case I think these things are a bit of a side issue. In my view, for anybody who doesn’t have a “we humans are special” bias, MWI is simpler in terms of description complexity because we don’t need to add a bunch of special cases and exceptions. Common sense be damned!