MrMind comments on Understanding and justifying Solomonoff induction

MrMind 16 Jan 2014 0:48 UTC
0 points

I don’t see how this solves the problem.

Well, first thing first: what problem?
My observation was that it doesn’t matter which kind of universal computation we use, because, besides for a constant, all the models gives the same complexity. This means that one universal prior will differ from another just by a finite number of terms.

Why should we prefer A to B?

If A and B are the only explanations surviving, then by all means we shouldn’t prefer A to B.
But the point is, in any situation where Solomonoff is used, you have also:
C = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days]
D = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days]
E = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days + 10% decrease in four days]
and so on and so forth.

Let’s say that to me, in order of probability, E > D > C > B > A, in a perfect anti-Occamian fashion. In any case, A + B + C + D + E, will have an amount of probability x. You still have 1-x to assign to all the other longer programs: F, G, H, etc.
However small the probability of A, there will be a program (say Z) for which all the other programs longer than Z will have probabilities lower than A. In this case, you have a finite violation of Occam’s razor, but it is forced on you by math that A is to be preferred as an explanation to longer programs (that is, longer than Z).
- kokotajlod 16 Jan 2014 21:05 UTC
  0 points
  Parent
  The problem being discussed is the relativity of complexity. So long as anything can be made out to be more complicated than anything else by an appropriate choice of language, it seems that Solomonoff Induction will be arbitrary, and we won’t be justified in thinking that it is accurate.
  
  Yes, one universal prior will differ from another by just a finite number of terms. But there is no upper bound on how large this finite number can be. So we can’t make any claims about how likely specific predictions are, without arbitrarily ruling out infinite sets of languages/models. So the problem remains.
  
  As you say, A is to be preferred to programs longer than Z. But there is no upper bound on how long Z might be. So any particular program—for example, B—is such that we have no reason to say that it is more or less likely than A. So it seems we have failed to find a full justification for why we should prefer A to B.
  
  Unless, as I said, we start talking about the space of all possible languages/models. But as I said, this threatens to just push the problem up a level.
  What links here?
  - TheAncientGeek's comment on Occam’s Razor by Eliezer Yudkowsky (12 Apr 2015 16:51 UTC; 1 point)