“P(W=X and T=Y) = P(W=X) P(T=Y|W=X);
P(W=X and T=Y) = exp(-len(W)) P(T=Y|W=X)”
therefore P(W=X) = exp(-len(W)). I’m trying to find a way to get this to sum to 1 across all W, but failing. Is there something wrong with this prior probability, or am I doing my math wrong?
“For example, a world-program that contains 10^10 random instructions is much less likely than one that contains 10^10 copies of the same instruction.”
Is that really necessary if a world-program with 1 copy of an instruction is functionally indistinguishable from a world-program with 10^10 copies of that single instruction?
“Since we can transfer complexity back and forth between W and T, we can’t justify applying Occam’s Razor to one but not the other, so it makes sense to apply it to T. This also means that we should also treat T as compressible; it is more likely that the universe is 3^^^3 steps old than that is 207798236098322674 steps old.”
I don’t think Occam’s Razor works that way.
“P(W=X and T=Y) = P(W=X) P(T=Y|W=X); P(W=X and T=Y) = exp(-len(W)) P(T=Y|W=X)” therefore P(W=X) = exp(-len(W)). I’m trying to find a way to get this to sum to 1 across all W, but failing. Is there something wrong with this prior probability, or am I doing my math wrong?
“For example, a world-program that contains 10^10 random instructions is much less likely than one that contains 10^10 copies of the same instruction.” Is that really necessary if a world-program with 1 copy of an instruction is functionally indistinguishable from a world-program with 10^10 copies of that single instruction?
“Since we can transfer complexity back and forth between W and T, we can’t justify applying Occam’s Razor to one but not the other, so it makes sense to apply it to T. This also means that we should also treat T as compressible; it is more likely that the universe is 3^^^3 steps old than that is 207798236098322674 steps old.” I don’t think Occam’s Razor works that way.