Since we can’t agree on a single ideal reference UTM, we instead approximate it by limiting ourselves to a class of “natural” UTMs which are mutually interpretable within a constant.
Even if you do that, you’re left with an infinite number of cliques such that within any given clique the languages can write short interpreters for each other. Picking one of the cliques is just as arbitrary as picking a single language to begin with.
i.e. for any given class of what-we-intuitively-think-of-as-complicated programs X, you can design a language that can concisely represent members of X and can concisely represent interpreters with this special case.
What I’m confused about is this constant penalty. Is it just “some constant” or is it knowable and small?
It’s a function of the language you’re writing an interpreter of and the language you’re writing it in. “Constant” in that it doesn’t depend on the programs you’re going to run in the new language.
i.e. for any given pair of languages there’s a finite amount of disagreement between those two versions of the Solomonoff prior; but for any given number there are pairs of languages that disagree by more than that.
Is there a specific UTM for which we can always write a short compiler on any other UTM?
No. There’s no law against having a gigabyte-long opcode for NAND, and using all the shorter opcodes for things-we-intuitively-think-of-as-complicated.
So is there then a pragmatic assumption that can be made? Maybe we assume that if I pick a turing machine blindly, without specifically designing it for a particular output string, it’s unlikely to be strongly biased towards that string.
What probability distribution over turing machines do you blindly pick it from? That’s another instance of the same problem.
Pragmatically, if I non-blindly pick some representation of turing machines that looks simple to me (e.g. the one Turing used), I don’t really doubt that it’s within a few thousand bits of the “right” version of solomonoff, whatever that means.
Pragmatically, if I non-blindly pick some representation of turing machines that looks simple to me (e.g. the one Turing used), I don’t really doubt that it’s within a few thousand bits of the “right” version of solomonoff, whatever that means.
Even if you do that, you’re left with an infinite number of cliques such that within any given clique the languages can write short interpreters for each other. Picking one of the cliques is just as arbitrary as picking a single language to begin with. i.e. for any given class of what-we-intuitively-think-of-as-complicated programs X, you can design a language that can concisely represent members of X and can concisely represent interpreters with this special case.
It’s a function of the language you’re writing an interpreter of and the language you’re writing it in. “Constant” in that it doesn’t depend on the programs you’re going to run in the new language. i.e. for any given pair of languages there’s a finite amount of disagreement between those two versions of the Solomonoff prior; but for any given number there are pairs of languages that disagree by more than that.
No. There’s no law against having a gigabyte-long opcode for NAND, and using all the shorter opcodes for things-we-intuitively-think-of-as-complicated.
So is there then a pragmatic assumption that can be made? Maybe we assume that if I pick a turing machine blindly, without specifically designing it for a particular output string, it’s unlikely to be strongly biased towards that string.
What probability distribution over turing machines do you blindly pick it from? That’s another instance of the same problem.
Pragmatically, if I non-blindly pick some representation of turing machines that looks simple to me (e.g. the one Turing used), I don’t really doubt that it’s within a few thousand bits of the “right” version of solomonoff, whatever that means.
Why not?