Yes, that was the post I read that generated my current line of questioning.
My reply to Viliam_Bur was phrased in terms of probabilities in a single universe, while your post here is in terms of mathematically possible universes. Let me try to rephrase my point to him in many-worlds language. This is not how I originally thought of the question, though, so I may end up a little muddled in translation.
Taking your original example, where half of the Mathematically Possible Universes start with 1, and the other half with 0. It is certainly possible to imagine a hypothetical Actual Multiverse where, nevertheless, there are 5 billion universes with 1, and only 5 universes with 0. Who knows why—maybe there is some overarching multiversal law we are unaware of, or may it’s just random. The point is that there is no a priori reason the Multiverse can’t be that way. (It may not even be possible to say that the multiverse probably isn’t that way without using Solomonoff Induction or Occam’s Razor, the very concepts under question.)
If this were the case, and I were somehow universe-hopping, I would over time come to the conclusion that SI was poorly calibrated and stop using it. This, I think, is basically the many-worlds version of my suggestion to Viliam_Bur. As I said to him, I am not arguing for or against SI, I am just asking knowledge people if there is any evidence that the probablities in this universe, or distributions across the multiverse, are actually in proportion to their information-complexities.
Yes, there’s no reason for Solomonoff to be well-calibrated in the end, but once we obtain information that most of the universes starting with 0 do not work, that is data against which most of the hypotheses starting with 0 will fail. At this point, brute solomonoff induction will be obviously inefficient, and we should begin using the heuristic of testing almost only hypotheses starting with 1.
In fact, we’re already doing this: We know for a fact that we live in the subset of universes where the acceleration between two particles is not constant and invariant of distance. So it is known that the simpler hypothesis where gravitational attraction is “0.02c/year times the total mass of the objects” is not more likely than the one where gravitational attraction also depends on distance and angular momentum and other factors, despite the former being much less complex than the latter (or so we presume).
There’s still murky depths and open questions, such as (IIRC) how to calculate how “long” (see Kolmogorov complexity) the instructions are.
Because suppose we build two universal turing machines with different sets of internal instructions.
We run Solomonoff Induction on the first machine, and it turns out that 01110101011110101010101111011 is the simplest possible program that will output “110”, and by analyzing the language and structure of the machine we learn that this corresponds to the hypothesis “2*3“, with the output being “6”. Meanwhile, on the second machine, 1111110 will also output “110”, and by analyzing it we find out that this corresponds to the hypothesis “6”, with the output being “6″.
On the first machine, to do the hypothesis “6”, we must write 101010101111110110101111111110000000111111110000110, which is much more complex than the earlier “2*3“ hypothesis, while on the second machine the “2*3” hypothesis is input as 1010111010101111, which is much longer than the “6” hypothesis.
Which hypothesis, between “2*3” and “6″, is simpler and less complex, based on what we observe from these two different machines? Which one is right? AFAIK, this is still completely unresolved.
Which hypothesis, between “2*3” and “6″, is simpler and less complex, based on what we observe from these two different machines? Which one is right? AFAIK, this is still completely unresolved.
If we’re considering hypotheses across all mathematically possible universes then why not consider hypotheses across all mathematically possible languages/machines as well?
What weight will we assing to the individual languages/machines? Their complexity… according to what? Perhaps we could make a matrix saying how complex a machine A is when simulated by a machine B, and then find the eigenvalues of the matrix?
If we’re considering hypotheses across all mathematically possible universes then why not consider hypotheses across all mathematically possible languages/machines as well?
This is also my intuition as well, though it has to be restricted to turing-complete systems I think. I was under the impression that there was already some active research in this direction, but I’ve never taken the time to look into that too deeply
Yes, that was the post I read that generated my current line of questioning.
My reply to Viliam_Bur was phrased in terms of probabilities in a single universe, while your post here is in terms of mathematically possible universes. Let me try to rephrase my point to him in many-worlds language. This is not how I originally thought of the question, though, so I may end up a little muddled in translation.
Taking your original example, where half of the Mathematically Possible Universes start with 1, and the other half with 0. It is certainly possible to imagine a hypothetical Actual Multiverse where, nevertheless, there are 5 billion universes with 1, and only 5 universes with 0. Who knows why—maybe there is some overarching multiversal law we are unaware of, or may it’s just random. The point is that there is no a priori reason the Multiverse can’t be that way. (It may not even be possible to say that the multiverse probably isn’t that way without using Solomonoff Induction or Occam’s Razor, the very concepts under question.)
If this were the case, and I were somehow universe-hopping, I would over time come to the conclusion that SI was poorly calibrated and stop using it. This, I think, is basically the many-worlds version of my suggestion to Viliam_Bur. As I said to him, I am not arguing for or against SI, I am just asking knowledge people if there is any evidence that the probablities in this universe, or distributions across the multiverse, are actually in proportion to their information-complexities.
Hmm, I think I see what you mean.
Yes, there’s no reason for Solomonoff to be well-calibrated in the end, but once we obtain information that most of the universes starting with 0 do not work, that is data against which most of the hypotheses starting with 0 will fail. At this point, brute solomonoff induction will be obviously inefficient, and we should begin using the heuristic of testing almost only hypotheses starting with 1.
In fact, we’re already doing this: We know for a fact that we live in the subset of universes where the acceleration between two particles is not constant and invariant of distance. So it is known that the simpler hypothesis where gravitational attraction is “0.02c/year times the total mass of the objects” is not more likely than the one where gravitational attraction also depends on distance and angular momentum and other factors, despite the former being much less complex than the latter (or so we presume).
There’s still murky depths and open questions, such as (IIRC) how to calculate how “long” (see Kolmogorov complexity) the instructions are.
Because suppose we build two universal turing machines with different sets of internal instructions.
We run Solomonoff Induction on the first machine, and it turns out that 01110101011110101010101111011 is the simplest possible program that will output “110”, and by analyzing the language and structure of the machine we learn that this corresponds to the hypothesis “2*3“, with the output being “6”. Meanwhile, on the second machine, 1111110 will also output “110”, and by analyzing it we find out that this corresponds to the hypothesis “6”, with the output being “6″.
On the first machine, to do the hypothesis “6”, we must write 101010101111110110101111111110000000111111110000110, which is much more complex than the earlier “2*3“ hypothesis, while on the second machine the “2*3” hypothesis is input as 1010111010101111, which is much longer than the “6” hypothesis.
Which hypothesis, between “2*3” and “6″, is simpler and less complex, based on what we observe from these two different machines? Which one is right? AFAIK, this is still completely unresolved.
If we’re considering hypotheses across all mathematically possible universes then why not consider hypotheses across all mathematically possible languages/machines as well?
What weight will we assing to the individual languages/machines? Their complexity… according to what? Perhaps we could make a matrix saying how complex a machine A is when simulated by a machine B, and then find the eigenvalues of the matrix?
Must stop… before head explodes...
This is also my intuition as well, though it has to be restricted to turing-complete systems I think. I was under the impression that there was already some active research in this direction, but I’ve never taken the time to look into that too deeply