The definition of Solomonoff induction is indifferent to the choice of universal Turing machine, because the difference it makes is a bounded number of bits. Two calculations of Kolmogorov complexity using different UTMs will always agree to within a number of bits c, where c depends on both of the UTMs (and measures how easily each can simulate the other).
c can be arbitrarily large.
If you pack your UTM full of preferred hypotheses given short codings (e.g. “let it be a human brain”), then you will get those hypotheses back out of it. But that did not come from Solomonoff induction. It came from your choice of UTM.
This raises the question: if, contra the theoretical indifference to choice of UTM, the choice does matter, how should the choice be made? One might consider a UTM having minimal description length, but which UTM do you use to determine that, before you’ve chosen one? Suppose one first chooses an arbitrary UTM T0, then determines which UTM T1 is given the shortest description length by T0, then generates T2 from T1 in the same way, does this necessarily converge on a UTM that in some definable sense has no extra hypotheses stuffed into it? Or does this process solve nothing?
Alternatively, you might go with some standard construction of a UTM out of a computability theory textbook. Those look minimal enough that no complex hypotheses would be unjustly favoured, but it seems to me there is still a theoretical gap to be plugged here.
My recursive suggestion won’t work. One can devise a UTM that gives the shortest code to itself, by the usual reflexivity constructions. The computability theory textbook method looks better. But what theoretical justification can be given for it? Why are we confident that bad explanations are not lurking within it?
Actually, perhaps we shouldn’t be. It has already been remarked by Eliezer that Solomonoff induction gives what looks like undue weight to hypotheses involving gigantic numbers with short descriptions, e.g. 3^^^3, despite the fact that, looking at the world, such numbers have never been useful for anything but talking about gigantic numbers, and proving what are generally expected to be very generous upper bounds for some combinatorial theorems.
I previously tried to do something similar—define an objective complexity upper bound by adding the complexity within a UTM to the complexity of the UTM. I don’t recall how I defined the complexity of a UTM—I don’t think it was recursive, though. Perhaps I used an average over all UTMs, weighted by the complexity of each UTM? But that still requires us to already have a definition of complexity for UTMs! (Unless there’s a unique fixed point in the assignment of complexity to UTMs, but then I’d suspect that function to be non-computable. And uniqueness doesn’t seem very likely anyway.)
The definition of Solomonoff induction is indifferent to the choice of universal Turing machine, because the difference it makes is a bounded number of bits. Two calculations of Kolmogorov complexity using different UTMs will always agree to within a number of bits c, where c depends on both of the UTMs (and measures how easily each can simulate the other).
c can be arbitrarily large.
If you pack your UTM full of preferred hypotheses given short codings (e.g. “let it be a human brain”), then you will get those hypotheses back out of it. But that did not come from Solomonoff induction. It came from your choice of UTM.
This raises the question: if, contra the theoretical indifference to choice of UTM, the choice does matter, how should the choice be made? One might consider a UTM having minimal description length, but which UTM do you use to determine that, before you’ve chosen one? Suppose one first chooses an arbitrary UTM T0, then determines which UTM T1 is given the shortest description length by T0, then generates T2 from T1 in the same way, does this necessarily converge on a UTM that in some definable sense has no extra hypotheses stuffed into it? Or does this process solve nothing?
Alternatively, you might go with some standard construction of a UTM out of a computability theory textbook. Those look minimal enough that no complex hypotheses would be unjustly favoured, but it seems to me there is still a theoretical gap to be plugged here.
My recursive suggestion won’t work. One can devise a UTM that gives the shortest code to itself, by the usual reflexivity constructions. The computability theory textbook method looks better. But what theoretical justification can be given for it? Why are we confident that bad explanations are not lurking within it?
Actually, perhaps we shouldn’t be. It has already been remarked by Eliezer that Solomonoff induction gives what looks like undue weight to hypotheses involving gigantic numbers with short descriptions, e.g. 3^^^3, despite the fact that, looking at the world, such numbers have never been useful for anything but talking about gigantic numbers, and proving what are generally expected to be very generous upper bounds for some combinatorial theorems.
I previously tried to do something similar—define an objective complexity upper bound by adding the complexity within a UTM to the complexity of the UTM. I don’t recall how I defined the complexity of a UTM—I don’t think it was recursive, though. Perhaps I used an average over all UTMs, weighted by the complexity of each UTM? But that still requires us to already have a definition of complexity for UTMs! (Unless there’s a unique fixed point in the assignment of complexity to UTMs, but then I’d suspect that function to be non-computable. And uniqueness doesn’t seem very likely anyway.)