But the idea that if you don’t yet have data bound to observation, then you decide the probability of a prior by looking at its complexity.
Complexity, defined as looking up the smallest compressed bitstring program for each possible turing machines (and that is the reason why it’s intractable unless you have infinite computational ressources yes ?), that can be said to generate this prior as the output of being run on that machine.
The longest the bitstring, the less likely the prior (and this has to do with the idea you can make more permutations on larger bit strings, like, a one bit string can be in two states, a two bit one can be in 2 states, a 3 bit one in 2 exp 3 states, and so on.).
Then you somehow average the probabilities for all pairs of (turing machine + program) into one overall probability ?
Do you understand Solomonoff’s Universal Prior?
Not the mathematical proof.
But the idea that if you don’t yet have data bound to observation, then you decide the probability of a prior by looking at its complexity.
Complexity, defined as looking up the smallest compressed bitstring program for each possible turing machines (and that is the reason why it’s intractable unless you have infinite computational ressources yes ?), that can be said to generate this prior as the output of being run on that machine.
The longest the bitstring, the less likely the prior (and this has to do with the idea you can make more permutations on larger bit strings, like, a one bit string can be in two states, a two bit one can be in 2 states, a 3 bit one in 2 exp 3 states, and so on.).
Then you somehow average the probabilities for all pairs of (turing machine + program) into one overall probability ?
(I’d love to understand that formally)