I was thinking of each program as emitting a finite sequence and that was the prediction. As the target sequence got longer, you’d be using larger programs which halted after a longer time. It’s not too hard to change the rules so to make non-halting variants also fail.
For example, suppose I create a program encoding that unfairly favors direct output. If the first bit is “1” then the output is just the remaining bits. If the first bit is “0″ then it’s a normal encoding… except only every tenth bit matters. The other 90% of bits are simply ignored and inaccessible. This penalizes the 5⁄6 arithmetic encoder so much that it is beaten by using the raw encoding solution, and you’ll find the prediction staying near 50⁄50 instead of 5⁄6.
I do think some variants of SI work even for maliciously chosen program encodings. It helps to output probability distributions, and it helps to react to input instead of having unconditional output. But clearly not all variants are secure against bad encodings.
In principle, SI is choosing fairly from a space of infinite programs. It’s only practical to see some programs as finite with weight proportional to the weight of all the infinite programs this finite program can be extended into. But no program knows its length, unless it explicitly counts to when to stop.
The wasteful encoding you propose does not make a difference to SI. What the wasteful encoding does is that the arithmetic encoding programs will be 10 times longer and thus 2^10 times more penalized, but there will be 2^10 times more of them. So in the sum-over-all-programs, arithmetic coding programs will take over the direct output programs just the same as before.
Programs that are 10 bits longer are penalized by 2^10. Programs that are 10 times longer are penalized by 2^(10n), where n is the size of the original program. The penalty isn’t washed out over time… it gets significantly worse.
Yes, you’re right, I was sloppy. Still, the programs are exactly that much more numerous, so their weight ends up being the same in your wasteful encoding as in a sane encoding.
Hmmm, right. It’s not enough to ignore the intermediate bits. Have to make them break the program unless they are all zero. Like if any of them are 1 then the program had no output except “syntax error” (but the raw output still allows them).
I was thinking of each program as emitting a finite sequence and that was the prediction. As the target sequence got longer, you’d be using larger programs which halted after a longer time. It’s not too hard to change the rules so to make non-halting variants also fail.
For example, suppose I create a program encoding that unfairly favors direct output. If the first bit is “1” then the output is just the remaining bits. If the first bit is “0″ then it’s a normal encoding… except only every tenth bit matters. The other 90% of bits are simply ignored and inaccessible. This penalizes the 5⁄6 arithmetic encoder so much that it is beaten by using the raw encoding solution, and you’ll find the prediction staying near 50⁄50 instead of 5⁄6.
I do think some variants of SI work even for maliciously chosen program encodings. It helps to output probability distributions, and it helps to react to input instead of having unconditional output. But clearly not all variants are secure against bad encodings.
In principle, SI is choosing fairly from a space of infinite programs. It’s only practical to see some programs as finite with weight proportional to the weight of all the infinite programs this finite program can be extended into. But no program knows its length, unless it explicitly counts to when to stop.
The wasteful encoding you propose does not make a difference to SI. What the wasteful encoding does is that the arithmetic encoding programs will be 10 times longer and thus 2^10 times more penalized, but there will be 2^10 times more of them. So in the sum-over-all-programs, arithmetic coding programs will take over the direct output programs just the same as before.
Programs that are 10 bits longer are penalized by 2^10. Programs that are 10 times longer are penalized by 2^(10n), where n is the size of the original program. The penalty isn’t washed out over time… it gets significantly worse.
Yes, you’re right, I was sloppy. Still, the programs are exactly that much more numerous, so their weight ends up being the same in your wasteful encoding as in a sane encoding.
Hmmm, right. It’s not enough to ignore the intermediate bits. Have to make them break the program unless they are all zero. Like if any of them are 1 then the program had no output except “syntax error” (but the raw output still allows them).
I see. And don’t know the answer. I’m curious how SI fends off this one.