By the way, to clarify a potential misunderstanding.
Under S.I. it is not extremely unlikely that we are in one of the high complexity universes. This is because the 2^-l probability is the prior probability of an individual program of length l, not the probability of all programs of length l cumulative. There’s 2^l possible bit strings of length l . (The programs themselves determine their length by stopping making use of the input tape, so any program of length l is also two programs of length l+2 . Speaking of the program lengths thus gets very confusing quickly).
There’s entirely too many ways to formalize S.I. which are basically equivalent, which makes it really difficult to discuss. I standardize on “random bits on a read-only input tape, write only output tape, and a single work tape initialized with zeroes” model, with the probability of a specific output string s being the prior for s . The machine M must be such that for any other machine M’ you can make a prefix such that putting that prefix on the beginning of the tape for M makes it exactly equivalent to M’ . (the prefix works as an emulator, i.e. a program for the machine M’, with this prefix, will run on M exactly as if it was M’). Very concise description, and you can get all the things like 2^-l out of that.
By the way, to clarify a potential misunderstanding.
Under S.I. it is not extremely unlikely that we are in one of the high complexity universes. This is because the 2^-l probability is the prior probability of an individual program of length l, not the probability of all programs of length l cumulative. There’s 2^l possible bit strings of length l . (The programs themselves determine their length by stopping making use of the input tape, so any program of length l is also two programs of length l+2 . Speaking of the program lengths thus gets very confusing quickly).
You mean “also two programs of length l+1”, right?
I think this comment by gjm addresses the “longer programs are as likely” idea: http://lesswrong.com/r/discussion/lw/jhm/understanding_and_justifying_solomonoff_induction/adfc
Yeah, that was a typo.
There’s entirely too many ways to formalize S.I. which are basically equivalent, which makes it really difficult to discuss. I standardize on “random bits on a read-only input tape, write only output tape, and a single work tape initialized with zeroes” model, with the probability of a specific output string s being the prior for s . The machine M must be such that for any other machine M’ you can make a prefix such that putting that prefix on the beginning of the tape for M makes it exactly equivalent to M’ . (the prefix works as an emulator, i.e. a program for the machine M’, with this prefix, will run on M exactly as if it was M’). Very concise description, and you can get all the things like 2^-l out of that.