The minimum description length formulation doesn’t allow for that at all. You are not allowed to pick whatever language you want, you have to pick the optimal code. If in the most concise code possible, state ‘a’ has a smaller code than state ‘b’, then ‘a’ must be more probable than ‘b’, since the most concise codes possible assign the smallest codes to the most probable states.
So if you wanna know what state a system is in, and you have the ideal (or close to ideal) code for the states in that system, the probability of that state will be strongly inversely correlated with the length of the code for that state.
You are not allowed to pick whatever language you want, you have to pick the optimal code. If in the most concise code possible, state ‘a’ has a smaller code than state ‘b’, then ‘a’ must be more probable than ‘b’, since the most concise codes possible assign the smallest codes to the most probable states.
I haven’t read anything like this in my admittedly limited readings on Solomonoff induction. Disclaimer: I am only a mere mathematician in a different field, and have only read a few papers surrounding Solomonoff.
The claims I’ve seen revolve around “assembly language” (for some value of assembly language) being sufficiently simple that any biases inherent in the language are small (some people claim constant multiple on the basis that this is what happens when you introduce a symbol ‘short-circuiting’ a computation). I think a more correct version of Anti-reductionist’s argument should run, “we currently do not know how the choice of language affects SI; it is conceivable that small changes in the base language imply fantastically different priors.”
I don’t know the answer to that, and I’d be very glad to know if someone has proved it. However, I think it’s rather unlikely that someone has proved it, because 1) I expect it will be disproven (on the basis that model-theoretic properties tend to be fragile), and 2) given the current difficulties in explicitly calculating SI, finding an explicit, non-trivial counter-example would probably be difficult.
Note that
Choose a language that can describe MWI more easily than Copenhagen, and they say you should believe MWI; choose a language that can describe Copenhagen more easily than MWI, and they say you should believe Copenhagen.
is not such a counter-example, because we do not know if “sufficiently assembly-like” languages can be chosen which exhibit such a bias. I don’t think the above thought-experiment is worth pursuing, because I don’t think we even know a formal (on the level of assembly-like languages) description of either CI or MWI.
Yep, but that’s all the proof shows: the more concise your code, the stronger the inverse correlation between the probability of a state and the code length of that state.
The minimum description length formulation doesn’t allow for that at all. You are not allowed to pick whatever language you want, you have to pick the optimal code. If in the most concise code possible, state ‘a’ has a smaller code than state ‘b’, then ‘a’ must be more probable than ‘b’, since the most concise codes possible assign the smallest codes to the most probable states.
So if you wanna know what state a system is in, and you have the ideal (or close to ideal) code for the states in that system, the probability of that state will be strongly inversely correlated with the length of the code for that state.
I haven’t read anything like this in my admittedly limited readings on Solomonoff induction. Disclaimer: I am only a mere mathematician in a different field, and have only read a few papers surrounding Solomonoff.
The claims I’ve seen revolve around “assembly language” (for some value of assembly language) being sufficiently simple that any biases inherent in the language are small (some people claim constant multiple on the basis that this is what happens when you introduce a symbol ‘short-circuiting’ a computation). I think a more correct version of Anti-reductionist’s argument should run, “we currently do not know how the choice of language affects SI; it is conceivable that small changes in the base language imply fantastically different priors.”
I don’t know the answer to that, and I’d be very glad to know if someone has proved it. However, I think it’s rather unlikely that someone has proved it, because 1) I expect it will be disproven (on the basis that model-theoretic properties tend to be fragile), and 2) given the current difficulties in explicitly calculating SI, finding an explicit, non-trivial counter-example would probably be difficult.
Note that
is not such a counter-example, because we do not know if “sufficiently assembly-like” languages can be chosen which exhibit such a bias. I don’t think the above thought-experiment is worth pursuing, because I don’t think we even know a formal (on the level of assembly-like languages) description of either CI or MWI.
Not Solomonoff, minimum description length, I’m coming from an information theory background, I don’t know very much about Solomonoff induction.
OP is talking about Solomonoff priors, no? Is there a way to infer on minimum description length?
What is OP?
EY
I meant Anti-reductionist, the person potato originally replied to… I suppose grandparent would have been more accurate.
He was talking about both.
So how do you predict with minimum description length?
With respect to the validity of reductionism, out of MML and SI, one theoretically predicts and the other does not. Obviously.
Aren’t you circularly basing your code on your probabilities but then taking your priors from the code?
Yep, but that’s all the proof shows: the more concise your code, the stronger the inverse correlation between the probability of a state and the code length of that state.