My understanding is that a hypothesis is a program which generates a complete prediction of all observations. So there is no specific hypothesis (X OR Y), for the same reason that there is no sequence of numbers which is (list of all primes OR list of all squares).
Note that by “complete prediction of all observations” I don’t mean things like “tomorrow you’ll see a blackbird”, but rather the sense that you get an observation in a MDP or POMDP. If you imagine watching the world through a screen with a given frame rate, every hypothesis has to predict every single pixel of that screen, for each frame.
I don’t know where this is explained properly though. In fact I think a proper explanation, which explains how these idealised “hypotheses” relate to hypotheses in the human sense, would basically need to explain what thinking is and also solve the entire embedded agency agenda. For that reason, I place very little weight on claims linking Solomonoff induction to bounded human or AI reasoning.
Thanks. So “There are no black swans.” is not a valid Solomonoff hypothesis? A hypothesis can’t exclude things, only make positive predictions?
Is a hypothesis allowed to make partial predictions? E.g. predict some pixels or frames and leave others unspecified. If so, then you could “and” together two partial hypotheses and run into a similar math consistency problem, right? But the way you said it sounds like a valid hypothesis may be required to predict absolutely everything, which would prevent conjoining two hypotheses since they’re already both complete and nothing more could be added.
A hypothesis can’t exclude things, only make positive predictions
Internally, the algorithm could work by ruling things out (“There are no black swans, so the world can’t be X”), but it must still completely specify everything. This may be clearer once you have the answer your question, “What counts as a hypothesis for Solomonoff induction?”: a halting program for some universal Turing machine. And the possible worlds are (in correspondence with) the elements of the space of possible outputs of that machine. So every “hypothesis” pins down everything exactly.
You may have also read some stuff about the Minimum Message Length formalization of Occam’s razor, and it may be affecting your intuitions. In this formalization, it’s more natural to use logical operations for part of your message. That is, you could say something like “It’s the list of all primes OR the list of all squares. Compressed data: first number is zero”. Here, we’ve used a logical operation on the statement of the model, but it’s made our lossless compression of the data longer. This is a meaningful thing to do in this formalization (whereas it’s not really in Solomonoff induction), but the thing we ended up with is definitely not the message with the shortest length. That means it doesn’t affect the prior because that’s all about the minimum message length.
“That is, you could say something like “It’s the list of all primes OR the list of all squares. Compressed data: first number is zero”″
Just to clarify here (because it took me a couple of seconds): you only need the first number of the compressed data because that is sufficient to distinguish whether you have a list of primes or a list of squares. But as Pongo said, you could describe that same list in a much more compressed way by skipping the irrelevant half of the OR statement.
My understanding is that a hypothesis is a program which generates a complete prediction of all observations. So there is no specific hypothesis (X OR Y), for the same reason that there is no sequence of numbers which is (list of all primes OR list of all squares).
Note that by “complete prediction of all observations” I don’t mean things like “tomorrow you’ll see a blackbird”, but rather the sense that you get an observation in a MDP or POMDP. If you imagine watching the world through a screen with a given frame rate, every hypothesis has to predict every single pixel of that screen, for each frame.
I don’t know where this is explained properly though. In fact I think a proper explanation, which explains how these idealised “hypotheses” relate to hypotheses in the human sense, would basically need to explain what thinking is and also solve the entire embedded agency agenda. For that reason, I place very little weight on claims linking Solomonoff induction to bounded human or AI reasoning.
Thanks. So “There are no black swans.” is not a valid Solomonoff hypothesis? A hypothesis can’t exclude things, only make positive predictions?
Is a hypothesis allowed to make partial predictions? E.g. predict some pixels or frames and leave others unspecified. If so, then you could “and” together two partial hypotheses and run into a similar math consistency problem, right? But the way you said it sounds like a valid hypothesis may be required to predict absolutely everything, which would prevent conjoining two hypotheses since they’re already both complete and nothing more could be added.
Internally, the algorithm could work by ruling things out (“There are no black swans, so the world can’t be X”), but it must still completely specify everything. This may be clearer once you have the answer your question, “What counts as a hypothesis for Solomonoff induction?”: a halting program for some universal Turing machine. And the possible worlds are (in correspondence with) the elements of the space of possible outputs of that machine. So every “hypothesis” pins down everything exactly.
You may have also read some stuff about the Minimum Message Length formalization of Occam’s razor, and it may be affecting your intuitions. In this formalization, it’s more natural to use logical operations for part of your message. That is, you could say something like “It’s the list of all primes OR the list of all squares. Compressed data: first number is zero”. Here, we’ve used a logical operation on the statement of the model, but it’s made our lossless compression of the data longer. This is a meaningful thing to do in this formalization (whereas it’s not really in Solomonoff induction), but the thing we ended up with is definitely not the message with the shortest length. That means it doesn’t affect the prior because that’s all about the minimum message length.
“That is, you could say something like “It’s the list of all primes OR the list of all squares. Compressed data: first number is zero”″
Just to clarify here (because it took me a couple of seconds): you only need the first number of the compressed data because that is sufficient to distinguish whether you have a list of primes or a list of squares. But as Pongo said, you could describe that same list in a much more compressed way by skipping the irrelevant half of the OR statement.