So we’re automatically giving x1 ca. 21000 higher probability – even before applying the length penalty 12|p|.
But note that under the Solomonoff prior, you will get another 2−2000−|G| penalty for these programs with DEADCODE. So with this consideration, the weight changes from
2−1000 (for normal p1) to 2−1000(1+2−|G|) (normal p1 plus 21000 DEADCODE versions of p1), which is not a huge change.
For your case of “uniform probability until 1090” I think you are right about exponential decay.
Yes, my point here is mainly that the exponential decay seems almost baked into the setup even if we don’t explicitly set it up that way, not that the decay is very notably stronger than it looks at first glance.
Given how many words have been spilled arguing over the philosophical validity of putting the decay with program length into the prior, this seems kind of important?
The number of programs of length at most n increases exponentially with n. Therefore any probability measure over them must decrease at least exponentially with length. That is, exponential decay is the least possible penalisation of length.
This is also true of the number of minimal programs of length at most n, hence the corresponding conclusion. (Proof: for each string S, consider the minimal program that writes S and halts. These programs are all different. Their sizes are no more than length(S)+c, where c is the fixed overhead of writing a program with S baked into it. Therefore exponentiality.)
I’ve written “at most n” instead of simply “n”, to guard against quirks like a programming language in which all programs are syntactically required to e.g. have even length, or deep theorems about the possible lengths of minimal programs.
I think you are broadly right.
But note that under the Solomonoff prior, you will get another 2−2000−|G| penalty for these programs with DEADCODE. So with this consideration, the weight changes from 2−1000 (for normal p1) to 2−1000(1+2−|G|) (normal p1 plus 21000 DEADCODE versions of p1), which is not a huge change.
For your case of “uniform probability until 1090” I think you are right about exponential decay.
Yes, my point here is mainly that the exponential decay seems almost baked into the setup even if we don’t explicitly set it up that way, not that the decay is very notably stronger than it looks at first glance.
Given how many words have been spilled arguing over the philosophical validity of putting the decay with program length into the prior, this seems kind of important?
The number of programs of length at most n increases exponentially with n. Therefore any probability measure over them must decrease at least exponentially with length. That is, exponential decay is the least possible penalisation of length.
This is also true of the number of minimal programs of length at most n, hence the corresponding conclusion. (Proof: for each string S, consider the minimal program that writes S and halts. These programs are all different. Their sizes are no more than length(S)+c, where c is the fixed overhead of writing a program with S baked into it. Therefore exponentiality.)
I’ve written “at most n” instead of simply “n”, to guard against quirks like a programming language in which all programs are syntactically required to e.g. have even length, or deep theorems about the possible lengths of minimal programs.