Yes, but every other model of universal computation gives the same results, besides for a single constant (the cost of intercomputation between the universal models)
I don’t see how this solves the problem. The other models of universal computation give different results, though they are similar in certain respects. For particular problems, like “Will the gravitational constant change tomorrow?” it doesn’t seem to help.
Consider the hypotheses A = [Laws of physics] and B = [Laws of physics + 10% increase in gravitational constant tomorrow]
Every model of computation that we have used so far makes A simpler than B. But there are infinitely many models that make B simpler than A. Why should we prefer A to B?
Maybe I’m just missing something. Various smart people have said that the relativity of complexity isn’t a problem because of the constant-factor agreement between models/languages. However, various other smart people who have thought about it a lot have said the opposite, so I don’t think I’m missing something obvious.
Maybe the answer has to do with a measure defined over the space of models? This would be nice, but I worry about just pushing the problem up a level.
Well, first thing first: what problem? My observation was that it doesn’t matter which kind of universal computation we use, because, besides for a constant, all the models gives the same complexity. This means that one universal prior will differ from another just by a finite number of terms.
Why should we prefer A to B?
If A and B are the only explanations surviving, then by all means we shouldn’t prefer A to B. But the point is, in any situation where Solomonoff is used, you have also: C = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days] D = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days] E = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days + 10% decrease in four days] and so on and so forth.
Let’s say that to me, in order of probability, E > D > C > B > A, in a perfect anti-Occamian fashion. In any case, A + B + C + D + E, will have an amount of probability x. You still have 1-x to assign to all the other longer programs: F, G, H, etc. However small the probability of A, there will be a program (say Z) for which all the other programs longer than Z will have probabilities lower than A. In this case, you have a finite violation of Occam’s razor, but it is forced on you by math that A is to be preferred as an explanation to longer programs (that is, longer than Z).
The problem being discussed is the relativity of complexity. So long as anything can be made out to be more complicated than anything else by an appropriate choice of language, it seems that Solomonoff Induction will be arbitrary, and we won’t be justified in thinking that it is accurate.
Yes, one universal prior will differ from another by just a finite number of terms. But there is no upper bound on how large this finite number can be. So we can’t make any claims about how likely specific predictions are, without arbitrarily ruling out infinite sets of languages/models. So the problem remains.
As you say, A is to be preferred to programs longer than Z. But there is no upper bound on how long Z might be. So any particular program—for example, B—is such that we have no reason to say that it is more or less likely than A. So it seems we have failed to find a full justification for why we should prefer A to B.
Unless, as I said, we start talking about the space of all possible languages/models. But as I said, this threatens to just push the problem up a level.
I don’t see how this solves the problem. The other models of universal computation give different results, though they are similar in certain respects. For particular problems, like “Will the gravitational constant change tomorrow?” it doesn’t seem to help.
Consider the hypotheses A = [Laws of physics] and B = [Laws of physics + 10% increase in gravitational constant tomorrow]
Every model of computation that we have used so far makes A simpler than B. But there are infinitely many models that make B simpler than A. Why should we prefer A to B?
Maybe I’m just missing something. Various smart people have said that the relativity of complexity isn’t a problem because of the constant-factor agreement between models/languages. However, various other smart people who have thought about it a lot have said the opposite, so I don’t think I’m missing something obvious.
Maybe the answer has to do with a measure defined over the space of models? This would be nice, but I worry about just pushing the problem up a level.
Well, first thing first: what problem?
My observation was that it doesn’t matter which kind of universal computation we use, because, besides for a constant, all the models gives the same complexity. This means that one universal prior will differ from another just by a finite number of terms.
If A and B are the only explanations surviving, then by all means we shouldn’t prefer A to B.
But the point is, in any situation where Solomonoff is used, you have also:
C = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days]
D = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days]
E = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days + 10% decrease in four days]
and so on and so forth.
Let’s say that to me, in order of probability, E > D > C > B > A, in a perfect anti-Occamian fashion. In any case, A + B + C + D + E, will have an amount of probability x. You still have 1-x to assign to all the other longer programs: F, G, H, etc.
However small the probability of A, there will be a program (say Z) for which all the other programs longer than Z will have probabilities lower than A. In this case, you have a finite violation of Occam’s razor, but it is forced on you by math that A is to be preferred as an explanation to longer programs (that is, longer than Z).
The problem being discussed is the relativity of complexity. So long as anything can be made out to be more complicated than anything else by an appropriate choice of language, it seems that Solomonoff Induction will be arbitrary, and we won’t be justified in thinking that it is accurate.
Yes, one universal prior will differ from another by just a finite number of terms. But there is no upper bound on how large this finite number can be. So we can’t make any claims about how likely specific predictions are, without arbitrarily ruling out infinite sets of languages/models. So the problem remains.
As you say, A is to be preferred to programs longer than Z. But there is no upper bound on how long Z might be. So any particular program—for example, B—is such that we have no reason to say that it is more or less likely than A. So it seems we have failed to find a full justification for why we should prefer A to B.
Unless, as I said, we start talking about the space of all possible languages/models. But as I said, this threatens to just push the problem up a level.