Kilobug wrote “There is no updating of probabilities, because hypothesis are always right or wrong”
Do not forget that any wrong hypothesis can become right by adding some error correcting instructions in it. It will only make the program longer and thus lower its probability. But is seems intuitive that the more a theory needs error corrections, the less it’s probable.
Kilobug wrote “there is no room for an hypothesis like “the coin will fall heads 75% of times, tails 25% of time”
There is room for both an hypothesis that predict the coin will always fall heads and this hypothesis has a probability of 75%. And an hypothesis that predict the coin will always fall tails and this hypothesis has a probability of 25%.
Kilobug wrote “there is no room for an hypothesis like “the coin will fall heads 75% of times, tails 25% of time” There is room for both an hypothesis that predict the coin will always fall heads and this hypothesis has a probability of 75%. And an hypothesis that predict the coin will always fall tails and this hypothesis has a probability of 25%.
Say the coin falls HHTHTHH. Then the hypothesis that it will always fall heads has probability 0%, because they weren’t all heads. Same for the hypothesis that it will always fall tails.
It seems like it would be pretty easy, though, to extend Solomonoff induction to have each hypothesis be an algorithm that outputs a probability distribution, and then update for each bit of evidence with Bayes’s theorem. If we did that for this example, I think the hypothesis that generated the 25%:75% probability distribution would eventually win out.
The original Solomonoff induction is already equivalent to that.
Proof sketch: Any program that computes a probability distribution can (with constant overhead and a source of uniform random bits) be converted into a program that samples from that same distribution. Then convert that into a family of deterministic programs, each of which hardcodes one possible random sequence.
The total weight of such a family in the Solomonoff prior is within a constant factor of the weight that the single probabilistic hypothesis would have had, and ruling out members of the family that disagreed with observation is equivalent to Bayesian updating.
Kilobug wrote “There is no updating of probabilities, because hypothesis are always right or wrong” Do not forget that any wrong hypothesis can become right by adding some error correcting instructions in it. It will only make the program longer and thus lower its probability. But is seems intuitive that the more a theory needs error corrections, the less it’s probable.
Kilobug wrote “there is no room for an hypothesis like “the coin will fall heads 75% of times, tails 25% of time” There is room for both an hypothesis that predict the coin will always fall heads and this hypothesis has a probability of 75%. And an hypothesis that predict the coin will always fall tails and this hypothesis has a probability of 25%.
Say the coin falls HHTHTHH. Then the hypothesis that it will always fall heads has probability 0%, because they weren’t all heads. Same for the hypothesis that it will always fall tails.
It seems like it would be pretty easy, though, to extend Solomonoff induction to have each hypothesis be an algorithm that outputs a probability distribution, and then update for each bit of evidence with Bayes’s theorem. If we did that for this example, I think the hypothesis that generated the 25%:75% probability distribution would eventually win out.
The original Solomonoff induction is already equivalent to that.
Proof sketch: Any program that computes a probability distribution can (with constant overhead and a source of uniform random bits) be converted into a program that samples from that same distribution. Then convert that into a family of deterministic programs, each of which hardcodes one possible random sequence.
The total weight of such a family in the Solomonoff prior is within a constant factor of the weight that the single probabilistic hypothesis would have had, and ruling out members of the family that disagreed with observation is equivalent to Bayesian updating.