A. But when we’re calculating the complexity of hypotheses, we’re doing it in respect to a particular model of computing!
Yes, but every other model of universal computation gives the same results, besides for a single constant (the cost of intercomputation between the universal models).
But who says it’s a universal machine! It may be anything, really! For example, who says that the universe doesn’t work as a quasi-Turing machine that outputs a huge number of “1” with 99% probability every time after it outputs a single “0″? The Universal Prior relies on the relation between inputs and outputs of the machine; if this relation is changed, the probabilities are going to be wrong.
On the other hand, the physical model may be strictly more powerful than a Universal Turing machine—it may be a hypercomputer! If it is, the universe is uncomputable.
Yes! But if it is, either we grab ahold of a source of hypercomputation a proceed from there (although ‘”solomonoff induction” & hypercomputation’ gives very few relevant results from Google), or we are never going to know that it is: we might just suspect from our models becoming longer and longer, but we will never be able to prove that.
Also, as B said correctly, justification for Solomonoff Induction is epistemic: we are doing the best we can (i.e. using a universal finite computational model), eliminating all that we know is factually incorrect without throwing away nothing that is not explicitly forbidden by data.
Yes, but every other model of universal computation gives the same results, besides for a single constant (the cost of intercomputation between the universal models)
I don’t see how this solves the problem. The other models of universal computation give different results, though they are similar in certain respects. For particular problems, like “Will the gravitational constant change tomorrow?” it doesn’t seem to help.
Consider the hypotheses A = [Laws of physics] and B = [Laws of physics + 10% increase in gravitational constant tomorrow]
Every model of computation that we have used so far makes A simpler than B. But there are infinitely many models that make B simpler than A. Why should we prefer A to B?
Maybe I’m just missing something. Various smart people have said that the relativity of complexity isn’t a problem because of the constant-factor agreement between models/languages. However, various other smart people who have thought about it a lot have said the opposite, so I don’t think I’m missing something obvious.
Maybe the answer has to do with a measure defined over the space of models? This would be nice, but I worry about just pushing the problem up a level.
Well, first thing first: what problem? My observation was that it doesn’t matter which kind of universal computation we use, because, besides for a constant, all the models gives the same complexity. This means that one universal prior will differ from another just by a finite number of terms.
Why should we prefer A to B?
If A and B are the only explanations surviving, then by all means we shouldn’t prefer A to B. But the point is, in any situation where Solomonoff is used, you have also: C = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days] D = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days] E = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days + 10% decrease in four days] and so on and so forth.
Let’s say that to me, in order of probability, E > D > C > B > A, in a perfect anti-Occamian fashion. In any case, A + B + C + D + E, will have an amount of probability x. You still have 1-x to assign to all the other longer programs: F, G, H, etc. However small the probability of A, there will be a program (say Z) for which all the other programs longer than Z will have probabilities lower than A. In this case, you have a finite violation of Occam’s razor, but it is forced on you by math that A is to be preferred as an explanation to longer programs (that is, longer than Z).
The problem being discussed is the relativity of complexity. So long as anything can be made out to be more complicated than anything else by an appropriate choice of language, it seems that Solomonoff Induction will be arbitrary, and we won’t be justified in thinking that it is accurate.
Yes, one universal prior will differ from another by just a finite number of terms. But there is no upper bound on how large this finite number can be. So we can’t make any claims about how likely specific predictions are, without arbitrarily ruling out infinite sets of languages/models. So the problem remains.
As you say, A is to be preferred to programs longer than Z. But there is no upper bound on how long Z might be. So any particular program—for example, B—is such that we have no reason to say that it is more or less likely than A. So it seems we have failed to find a full justification for why we should prefer A to B.
Unless, as I said, we start talking about the space of all possible languages/models. But as I said, this threatens to just push the problem up a level.
Yes, but every other model of universal computation gives the same results, besides for a single constant (the cost of intercomputation between the universal models).
The most-different case is the same results but for the worst-case intercomputation cost.
However, the ratio between best and worst case intercomputation costs for different sets of operations might not be trivial, and that would distort things.
You’re right—Solomonoff induction is justified for any model, whenever it is computable. The exact technical details are unimportant. I was a bit confused about this point.
Essentially, Solomonoff induction “works” in the physical universe (i.e. is the best predictor) whenever:
1) there is a source of randomness
2) there are some rules
3) the universe is not hypercomputing.
If there is no source of randomness involved, the process is fully deterministic, and can be best predicted by deductive reasoning.
If there are no rules, the process is fully random. In this case just tossing a fair coin will predict equally well (with P=0.5).
It it’s hypercomputing, a “higher-order” Solomonoff induction will do better.
In the context of Bayesian reasoning, I understand “random” as “not enough information”, which is different from “non-deterministic”. So that:
If there is no source of randomness involved, the process is fully deterministic, and can be best predicted by deductive reasoning.
Only if we have enough information to exactly compute the next state from the previous ones. When this is not the case, lack of information acts as a source of randomness, for which SI can account.
If there are no rules, the process is fully random. In this case just tossing a fair coin will predict equally well (with P=0.5).
In a sense, yes. There might still be useful pockets of computability inside the universe, though.
It it’s hypercomputing, a “higher-order” Solomonoff induction will do better.
I’m not sure “higher-order” Solomonoff induction is even a thing.
“Higher-order” SI is just SI armed with an upgraded universal prior—one that is defined with reference to a universal hypercomputer instead of a universal Turing machine.
It’s not that simple. There isn’t a single model of hypercomputation, and even inside the same model hypercomputers might have different cardinal powers.
Some random thoughts:
Yes, but every other model of universal computation gives the same results, besides for a single constant (the cost of intercomputation between the universal models).
What you’re describing here is a non-deterministic Turing machine. They have been proven equivalent to deterministic ones.
Yes! But if it is, either we grab ahold of a source of hypercomputation a proceed from there (although ‘”solomonoff induction” & hypercomputation’ gives very few relevant results from Google), or we are never going to know that it is: we might just suspect from our models becoming longer and longer, but we will never be able to prove that.
Also, as B said correctly, justification for Solomonoff Induction is epistemic: we are doing the best we can (i.e. using a universal finite computational model), eliminating all that we know is factually incorrect without throwing away nothing that is not explicitly forbidden by data.
I don’t see how this solves the problem. The other models of universal computation give different results, though they are similar in certain respects. For particular problems, like “Will the gravitational constant change tomorrow?” it doesn’t seem to help.
Consider the hypotheses A = [Laws of physics] and B = [Laws of physics + 10% increase in gravitational constant tomorrow]
Every model of computation that we have used so far makes A simpler than B. But there are infinitely many models that make B simpler than A. Why should we prefer A to B?
Maybe I’m just missing something. Various smart people have said that the relativity of complexity isn’t a problem because of the constant-factor agreement between models/languages. However, various other smart people who have thought about it a lot have said the opposite, so I don’t think I’m missing something obvious.
Maybe the answer has to do with a measure defined over the space of models? This would be nice, but I worry about just pushing the problem up a level.
Well, first thing first: what problem?
My observation was that it doesn’t matter which kind of universal computation we use, because, besides for a constant, all the models gives the same complexity. This means that one universal prior will differ from another just by a finite number of terms.
If A and B are the only explanations surviving, then by all means we shouldn’t prefer A to B.
But the point is, in any situation where Solomonoff is used, you have also:
C = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days]
D = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days]
E = [Laws of physics + 10% increase in gravitational constant tomorrow + 10% decrease in two days + 10% increase in three days + 10% decrease in four days]
and so on and so forth.
Let’s say that to me, in order of probability, E > D > C > B > A, in a perfect anti-Occamian fashion. In any case, A + B + C + D + E, will have an amount of probability x. You still have 1-x to assign to all the other longer programs: F, G, H, etc.
However small the probability of A, there will be a program (say Z) for which all the other programs longer than Z will have probabilities lower than A. In this case, you have a finite violation of Occam’s razor, but it is forced on you by math that A is to be preferred as an explanation to longer programs (that is, longer than Z).
The problem being discussed is the relativity of complexity. So long as anything can be made out to be more complicated than anything else by an appropriate choice of language, it seems that Solomonoff Induction will be arbitrary, and we won’t be justified in thinking that it is accurate.
Yes, one universal prior will differ from another by just a finite number of terms. But there is no upper bound on how large this finite number can be. So we can’t make any claims about how likely specific predictions are, without arbitrarily ruling out infinite sets of languages/models. So the problem remains.
As you say, A is to be preferred to programs longer than Z. But there is no upper bound on how long Z might be. So any particular program—for example, B—is such that we have no reason to say that it is more or less likely than A. So it seems we have failed to find a full justification for why we should prefer A to B.
Unless, as I said, we start talking about the space of all possible languages/models. But as I said, this threatens to just push the problem up a level.
The most-different case is the same results but for the worst-case intercomputation cost.
However, the ratio between best and worst case intercomputation costs for different sets of operations might not be trivial, and that would distort things.
You’re right—Solomonoff induction is justified for any model, whenever it is computable. The exact technical details are unimportant. I was a bit confused about this point.
Essentially, Solomonoff induction “works” in the physical universe (i.e. is the best predictor) whenever:
1) there is a source of randomness
2) there are some rules
3) the universe is not hypercomputing.
If there is no source of randomness involved, the process is fully deterministic, and can be best predicted by deductive reasoning.
If there are no rules, the process is fully random. In this case just tossing a fair coin will predict equally well (with P=0.5).
It it’s hypercomputing, a “higher-order” Solomonoff induction will do better.
In the context of Bayesian reasoning, I understand “random” as “not enough information”, which is different from “non-deterministic”.
So that:
Only if we have enough information to exactly compute the next state from the previous ones. When this is not the case, lack of information acts as a source of randomness, for which SI can account.
In a sense, yes. There might still be useful pockets of computability inside the universe, though.
I’m not sure “higher-order” Solomonoff induction is even a thing.
“Higher-order” SI is just SI armed with an upgraded universal prior—one that is defined with reference to a universal hypercomputer instead of a universal Turing machine.
It’s not that simple. There isn’t a single model of hypercomputation, and even inside the same model hypercomputers might have different cardinal powers.